Hadoop Developer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Professional Software developer with around 6 years of experience in IT industry, which includes 4+ years of experience in Hadoop/Big Data technologies and 2 years of extensive experience in Spark, Python, Teradata, Informatica ETL development and Data analytics.
- Hands on experience with Big Data Ecosystem like Hadoop (2.0 and YARN) framework technologies such as HDFS, MapReduce, Sqoop, Hive, Oozie, Impala, Zookeeper, NiFi,
- Experience in using Cloudera and Hortonworks distributions.
- Experience in analyzing data using Spark SQL, HiveQL and custom MapReduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Experience with different data source files like Avro, Parquet, RC, and ORC formats and compressions like snappy, bzip.
- Experience with Oozie scheduler in setting up workflow jobs with actions that run Hive and Sqoop jobs.
- Hands on experience with Relational databases like Teradata, Oracle and MySQL
- Strong Experience in Unit Testing and System testing in Big Data.
- Hands on experience with Spark using Scala and Python.
- Hands on experience working with JSON files.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets API.
- Made POC on Spark Real Time Streaming using Kafka into HDFS.
- Extensively used Spark SQL, Pyspark & Scala API’s for querying and transformation of data residing in Hive.
- Good knowledge on Amazon Web Services (AWS) cloud services like EC2, S3, EBS, RDS and VPC.
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
- Experience in developing the complex SQL queries, unions and multiple table joins and experience with views.
- Experience on data extraction, Transformation and loading (ETL) data from various sources like Oracle, SQL Server and flat files using Informatica Power Center.
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Normalizer, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Sorter, Sequence Generator and knowledge on Web Service Consumer, HTTP Transformation and XML Source Qualifier.
- Experience in Object Oriented Analysis Design (OOAD) and development.
- Hands on experience in application development using Java, RDMS, LINUX and UNIX shell scripting.
- Hands on experience with version control software tools like SVN, Bit Bucket and Git lab.
CORE COMPETENCIES:
- Hadoop Development & Troubleshooting
- Data Analysis
- Map Reduce Programming
- Performance Tuning of Hive & Impala
AREAS OF EXPERTISE:
Big Data Technologies: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Impala, Oozie, FlumeZookeeper, Kafka, Nifi, Zeppelin.
Spark components: Spark, Spark SQL and Python.
AWS Cloud Services: S3, EBS, EC2, VPC, Redshift, EMR
Programming Languages: Java, Python..
Databases: Teradata, Oracle, MySQL, SQL Server.
ETL Tool: Informatica Power Center (8.x, 9.x)
Scripting and Query Languages: Unix Shell scripting, SQL.
Operating Systems: Windows, UNIX, Linux distributions, Mac OS.
Other Tools: Maven, Tableau, GitHub.
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Experience in implementing Pyspark framework and UNIX scripting to implement the workflow for the jobs.
- Involved in gathering business requirement, analyze the use case and implement the use case end to end.
- Worked closely with the architect, enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
- Experienced in loading the raw data into RDDs and validate the data.
- Experienced in converting the validated RDDs into Data frames for further processing.
- Implemented the SparkSQL code logic to join multiple data frames to generate application specific aggregated results.
- Experienced in fine tuning the jobs for better performance in the production cluster space.
- Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
- Experienced working with hive database through beeline.
- Worked on analyzing and resolving the production job failures in several scenarios.
- Implemented Spark SQL queries which intermix the Hive queries with the programmatic data manipulations supported by RDDs and data frames in python.
- Implemented UNIX scripts to define the use case workflow and also to process the data files, and automate the jobs.
- Knowledge on implementing the JILs to automate the jobs in production cluster.
Environment: Spark, Python, Hive, Sqoop, Oozie, Unix Scripting, SparkSQL, Impala, Hue, Beeline, Autosys.
Confidential, Jackson vile, FL
Hadoop Developer
Responsibilities:
- Involved in the design and development of migrating an existing application; conducted peer review of the code to make sure that the requirements and code are in sync - is performant efficient.
- Worked on performance tuning of Spark Applications by setting right Batch Interval time, correct level of Parallelism.
- Converted Hive/SQL queries into Spark transformations using Spark RDDs, Data frames and Python.
- Built NIFI workflows for real-time data ingestion onto Hadoop and Teradata at same time
- Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with Hive to do transformations, joins and some pre-aggregations before storing the data into HDFS.
- Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Teradata.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Experience in working on the SAS code to convert the existing SAS datasets to the Hadoop environment.
- Experience in Job management using Autosys scheduler and developed job processing scripts using Oozie workflow.
Environment: Hadoop, Hive QL, Teradata, NIFI, Apache Spark-1.6, Python, HDFS, Hive, Sqoop, Shell-script, Impala, Sas, MySQL.
Confidential, Dallas, Tx
Hadoop Developer
Responsibilities:
- Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows jobs in Oozie.
- Worked on migrating the customers from Teradata to Hadoop and thus involved in Teradata decommission that in turn helped the organization by cost cutting.
- Developed a utility to move the data from production to lower lanes using Distcp.
- Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
- E2E development of the ETL process by sourcing the data from upstream, perform complex transformations and export the data to Teradata.
- Exported the aggregated data into RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.
- Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
- Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.
Environment: Hadoop, Cloudera, HDFS, Hive, Sqoop, Shell-script, LINUX, Impala, Teradata.
Confidential
Teradata/ETL Developer
Responsibilities:
- Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying the data sources required for the requests.
- Proficient in importing/exporting large amounts of data from files to Teradata and vice versa.
- Developed the DW ETL scripts using BTEQ, Stored Procedures, and Macros in Teradata.
- Developed scripts for loading the data into the base tables in EDW using FastLoad, MultiLoad and BTEQ utilities of Teradata
- Created numerous scripts with Teradata utilities BTEQ, MLOAD and FLOAD.
- Highly experienced in Performance Tuning and Optimization for increasing the efficiency of the scripts.
- Developed reports using the Teradata advanced techniques like rank, row number
- Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
- Tested database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
- Proficient in working on Set, Multiset, Derived, Volatile Temporary tables.
- Designed and developed weekly, monthly reports related to the marketing and financial departments using Teradata SQL.
- Extracted data from existing data source and performed ad-hoc queries.
- Created Complex mappings using Unconnected, Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
- Created Mapplet and used them in different Mappings.
- Created events and tasks in the work flows using workflow manager.
Environment: Informatica8.6.1, Teradata V12,BTEQ, MLOAD, FLOAD,TPUMP, TPT,ORACLE, SQL, PLSQL, UNIX, Windows XP.
Confidential
Software Engineer
Responsibilities:
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created User Interface using JSF.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier.
- Involved in JUnit testing of the application using JUnit framework.
- Written Stored Procedures functions and views to retrieve the data.
- Used Maven builds to wrap around Ant build scripts.
- CVS tool is used for version control of code and project documents.
- Responsible to mentor/work with team members to make sure the standards and guidelines are followed and
Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, Web Logic Workshop and CVS.