Hadoop Developer Resume
Charlotte, NC
SUMMARY
- Spark/Hadoop developer with 7 Years of professional IT experience including 4 Years of Big data experience in Hadoop ecosystem components in Data Ingestion, Data modeling, Querying, Processing, Storage Analysis, Data Integration and Implementing enterprise level systems transforming Big data.
- Astounding experience in Data Extraction, Transformation, Loading and Data Analysis and Data Visualization utilizing Cloudera Platform (Spark, Scala, HDFS, Hive, Sqoop, Kafka, Oozie).
- Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as per the necessities.
- Expertise in developing and implementing Spark programs in Scala using Hadoop to work with Structured and Semi - structured data.
- Utilized Spark for intuitive queries, processing of streaming data and integration with NoSQL database for bulk volume of data.
- Experience in extracting data from heterogeneous sources like flat files, MySQL, Teradata into HDFS using Sqoop and the other way around.
- Broad experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
- Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
- Experienced in migrating ETL transformations using Spark jobs and Pig Latin Scripts.
- Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume.
- Experience in using Oozie schedulers and Unix Scripting to implement Cron jobs that execute different kind of Hadoop actions.
- Good experience in optimization/performance tuning of Spark Jobs, PIG & Hive Queries.
- Familiarly comfortable with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining and advanced data processing. Experience optimizing ETL workflows.
- Excellent understanding of Spark Architecture and framework, Spark Context, APIs, RDDs, Spark SQL, Data frames, Streaming, MLlib.
- Adequate understanding of Hadoop Gen1/Gen2 architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and YARN architecture and its daemons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
- Hands on experience in using the Hue browser for interacting with Hadoop components.
- Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
- Good in understanding hash partitions and fixing bugs.
- Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.
TECHNICAL SKILLS
Big Data Technologies: Spark and Scala, Hadoop Ecosystem Components - HDFS, Hive, Sqoop, Impala, Flume, Map Reduce, Pig and Cloudera Hadoop Distribution CDH 5.13/5.8.2
Cloud Platforms: Amazon Web Services (AWS), Microsoft Azure and Google Cloud
Monitoring Tools: Cloudera Manager
Programming Languages: Scala, Java, SQL, PL/SQL, Python.
Scripting Languages: Shell Scripting, CSH.
NoSQL Databases: HBase
Databases: Oracle 11g, MySQL, MS SQL Server
Schedulers: Oozie
Operating Systems: Windows 7/8/10, Unix, Linux
Other Tools: Hue, IntelliJ IDEA, Eclipse, Maven, Zoo Keeper
Front End Technologies: HTML5, XHTML, XML, CSS
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Developed Map Reduce programs for data extraction, transformation and aggregation. Supported Map Reduce Jobs those are running on the cluster.
- Developed Pig Scripts for replacing the existing home loans legacy process to Hadoop and data is back fed to retail legacy mainframe systems.
- Enacted solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
- Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
- Developed Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Optimization of Map reduce algorithms using combiners and hash partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Used the default hash partitioner and increased the workflow.
- Wrote Hive Queries to have a consolidated view of the telematics data.
- Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Developed HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
- Entangled in troubleshooting errors in Shell, Hive and Map Reduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented solutions for ingesting data from various sources and processing the Data Utilizing Big Data Technologies such as Hive, Pig, Sqoop, HBase, and Map reduce, etc.
- Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
- Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, MySQL, CDH 5.13, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop 1.4.3, Flume, Unix Shell Scripting, Python 2.6, Apache Kafka.
Confidential, Dublin, OH
Spark/Hadoop Developer
Responsibilities:
- Involved in scripting Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as to the requirements.
- Stock the data into Spark RDD and Perform in-memory data computation to generate the output exact to the requirements.
- Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
- Developed Spark jobs, Hive jobs to encapsulate and transform data.
- Fine-Tune Spark application to improve performance.
- Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
- Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
- Developed the code for buckets and hash partitions.
- Refine Performance tuning of long running Greenplum user defined functions. Leveraged the feature of temporary tables break the code into small sub part load to a temp table and join it later with the corresponding join tables. Table distribution keys are refined based on the data granularity and primary key column combination.
- Toiled on numerous file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files using Map Reduce Programs.
- Expanded daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Resolved performance issues in Hive and Pig scripts with analysing Joins, Group and Aggregation and how it translate to MR jobs.
- Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to determine business insights and solve clients operational and strategic problems.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
- Extensively worked on Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
- Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
- Designing Oozie workflows for job scheduling and batch processing.
Environment: Apache Spark, Scala, CDH 5.8.2, Hive, HDFS, YARN, Map Reduce, Sqoop, Flume, Unix Shell Scripting, Python, IntelliJ IDEA, MySQL, Apache Kafka.
Confidential, Boca Raton, FL
Hadoop Developer
Responsibilities:
- Worked on importing data from various sources and performed transformations using Map Reduce and Hive to load data into HDFS.
- Used Apache Ambari to communicate with Hadoop Eco System components.
- Developed multiple MapReduce jobs in PIG and HIVE for data cleaning and pre-processing.
- Hands on experience on HIVE queries and functions for evaluation, filtering, loading and storing of data.
- DevelopedHiveQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and Rest API.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Moved all log/text files generated by various products into HDFS location.
- Experienced in managing and reviewing theHadooplog files.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Experienced with join different data sets using Pig join operations to perform queries using pig scripts.
- Used Oozie andZookeeperfor workflow scheduling and monitoring.
- Actively involved in loading data from UNIX file system to HDFS.
- Used Sqoop for importing and exporting data into HDFS and HIVE.
- Used Flume to import the Web Logics.
- Developed Shell scripts to automate routine DBA tasks.
- Development Review (code review) to ensure that the code functionality is as per business requirements and the standards are followed.
- Implemented test scripts for supporting test driven development and continuous integration.
- Followed Agile Methodology.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, IntelliJ, Agile, Unix and Shell Scripting.
Confidential, Deerfield, IL
Java/SQL Developer
Responsibilities:
- Prepared technical documentation for development of the code.
- Involved in the preparation of technical documentation for process flow.
- Developed many complex Stored Procedures, Functions, and Packages using PL/SQL using advanced features like arrays, nested table, collections, etc.
- Created Oracle performance-related features such as cost-based optimizer, execution plans, hints, Indexes, clusters, partitioning, temporary tables, views.
- Created Tables, Synonyms, Sequences, Used Database trigger for making history of insertion, updating, deletion and all kind of Audit Routines Enforced Database Integrity using primary keys and foreign keys.
- Implemented server-side functionalities using java, servlets, Hibernate and developing business logic in Java using the J2EE API.
- Implemented Struts MVC framework.
- Modified and Created existing functions and procedures based on business requirements.
- Worked extensively on Exception handling to trouble-shoot PL/SQL code.
- Exposure in production implementation and code migration activities.
- Created the effective data visualization with tableau tool.
- Used data Sources from Excel, Flat files and intranet data’s and developed datasets for tableau,
- Created user defined datasets, and created datasets using Custom SQL.
- Used action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Effectively used data blending features in tableau.
- Created custom calculations and other calculations such as aggregations, date, and percent.
- Provided authentication and authorization for group of users for accessing tableau file.
Environment: Oracle 10g RAC, OEM, SQL, PLSQL, Java, JDK, Unix Shell scripts, SQL navigator, AQT, SQL Loader, Tableau, Windows 2003.
Confidential, Houston, TX
SQL Developer
Responsibilities:
- Created new logical and physical database design to support multi-year conferences for the tables of employees, histories, skills, experience and payroll.
- Created Database Objects like Tables, Stored Procedures, Views, Triggers, Rules, Defaults, user defined data types and functions in SQL Server.
- Created PL/SQL tables and global variables and using IN and OUT parameters with TYPE, ROWTYPE, PL/SQL tables and PL/SQL records.
- Extensively worked with Dynamic SQL, Composite data types & Global Temporary Tables.
- Developed user documentation for all the application modules. Also responsible for writing test plan documents and unit testing for the application modules.
- Created stored procedures and functions to support efficient data storage and manipulation.
- Created/ reallocated database objects on appropriate File groups to ease maintenance and improve data access performance.
- Implemented security by creating User logins, Roles and granting Users access to the database according to their role.
- Converted all Oracle ETL Packages to Informatica Mappings and created workflows/sessions.
- Used UTL JOB to automate the PL/SQL procedures and packages.
- Performed SQL and PL/SQL tuning and Application tuning using various tools like TKPROF and AUTOTRACE
- Developed scenarios for Unit, Integration testing to ensure that all the components work correctly when integrated.
- Actively involved in interacting with front end Java developers and gathered User Requirement and Online System Specification.
Environment: Oracle 9i, Windows 2003, UNIX, SQL, PL/SQL, SQL Loader, Erwin and SQL Navigator.