Hadoop Developer Resume
Charlotte, NC
SUMMARY
- Professional Software developer with around 6 years of experience in IT industry, which includes 4+ years of experience in Hadoop/Big Data technologies and 2 years of extensive experience in Spark, Python, Teradata, Informatica ETL development and Data analytics.
- Hands on experience with Big Data Ecosystem like Hadoop (2.0 and YARN) framework technologies such as HDFS, MapReduce, Sqoop, Hive, Oozie, Impala, Zookeeper, NiFi,
- Experience in using Cloudera and Hortonworks distributions.
- Experience in analyzing data using Spark SQL, HiveQL and custom MapReduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Experience with different data source files like Avro, Parquet, RC, and ORC formats and compressions like snappy, bzip.
- Experience with Oozie scheduler in setting up workflow jobs with actions dat run Hive and Sqoop jobs.
- Hands on experience with Relational databases like Teradata, Oracle and MySQL
- Strong Experience in Unit Testing and System testing in Big Data.
- Hands on experience with Spark using Scala and Python.
- Hands on experience working with JSON files.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets API.
- Made POC on Spark Real Time Streaming using Kafka into HDFS.
- Extensively used Spark SQL, Pyspark & Scala API’s for querying and transformation of data residing in Hive.
- Good noledge on Amazon Web Services (AWS) cloud services like EC2, S3, EBS, RDS and VPC.
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
- Experience in developing teh complex SQL queries, unions and multiple table joins and experience with views.
- Experience on data extraction, Transformation and loading (ETL) data from various sources like Oracle, SQL Server and flat files using Informatica Power Center.
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Normalizer, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Sorter, Sequence Generator and noledge on Web Service Consumer, HTTP Transformation and XML Source Qualifier.
- Experience in Object Oriented Analysis Design (OOAD) and development.
- Hands on experience in application development using Java, RDMS, LINUX and UNIX shell scripting.
- Hands on experience with version control software tools like SVN, Bit Bucket and Git lab.
TECHNICAL SKILLS
Big Data Technologies: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Impala, Oozie, FlumeZookeeper, Kafka, Nifi, Zeppelin.
Spark components: Spark, Spark SQL and Python.
AWS Cloud Services: S3, EBS, EC2, VPC, Redshift, EMR
Programming Languages: Java, Python..
Databases: Teradata, Oracle, MySQL, SQL Server.
ETL Tool: Informatica Power Center (8.x, 9.x)
Scripting and Query Languages: Unix Shell scripting, SQL.
Operating Systems: Windows, UNIX, Linux distributions, Mac OS.
Other Tools: Maven, Tableau, GitHub.
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Experience in implementing Pyspark framework and UNIX scripting to implement teh workflow for teh jobs.
- Involved in gathering business requirement, analyze teh use case and implement teh use case end to end.
- Worked closely with teh architect, enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
- Experienced in loading teh raw data into RDDs and validate teh data.
- Experienced in converting teh validated RDDs into Data frames for further processing.
- Implemented teh SparkSQL code logic to join multiple data frames to generate application specific aggregated results.
- Experienced in fine tuning teh jobs for better performance in teh production cluster space.
- Worked extensively in Impala Hue to analyze teh processed data and to generate teh end reports.
- Experienced working with hive database through beeline.
- Worked on analyzing and resolving teh production job failures in several scenarios.
- Implemented Spark SQL queries which intermix teh Hive queries with teh programmatic data manipulations supported by RDDs and data frames in python.
- Implemented UNIX scripts to define teh use case workflow and also to process teh data files, and automate teh jobs.
- Knowledge on implementing teh JILs to automate teh jobs in production cluster.
Environment: Spark, Python, Hive, Sqoop, Oozie, Unix Scripting, SparkSQL, Impala, Hue, Beeline, Autosys.
Confidential, Jackson vile, FL
Hadoop Developer
Responsibilities:
- Involved in teh design and development of migrating an existing application; conducted peer review of teh code to make sure dat teh requirements and code are in sync - is performant efficient.
- Worked on performance tuning of Spark Applications by setting right Batch Interval time, correct level of Parallelism.
- Converted Hive/SQL queries into Spark transformations using Spark RDDs, Data frames and Python.
- Built NIFI workflows for real-time data ingestion onto Hadoop and Teradata at same time
- Extensively worked on Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Worked with Hive to do transformations, joins and some pre-aggregations before storing teh data into HDFS.
- Imported all teh customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Teradata.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Worked with BI teams in generating teh reports and designing ETL workflows on Tableau.
- Experience in working on teh SAS code to convert teh existing SAS datasets to teh Hadoop environment.
- Experience in Job management using Autosys scheduler and developed job processing scripts using Oozie workflow.
Environment: Hadoop, Hive QL, Teradata, NIFI, Apache Spark-1.6, Python, HDFS, Hive, Sqoop, Shell-script, Impala, Sas, MySQL.
Confidential - Dallas, Tx
Hadoop Developer
Responsibilities:
- Created Hive tables to store teh processed results in a tabular format and written Hive scripts to transform and aggregate teh disparate data.
- Automated teh process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows jobs in Oozie.
- Worked on migrating teh customers from Teradata to Hadoop and thus involved in Teradata decommission dat in turn halped teh organization by cost cutting.
- Developed a utility to move teh data from production to lower lanes using Distcp.
- Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
- E2E development of teh ETL process by sourcing teh data from upstream, perform complex transformations and export teh data to Teradata.
- Exported teh aggregated data into RDBMS using Sqoop for creating dashboards in teh Tableau and developed trend analysis using statistical features.
- Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
- Utilized Agile Scrum Methodology to manage and organize teh team with regular code review sessions.
Environment: Hadoop, Cloudera, HDFS, Hive, Sqoop, Shell-script, LINUX, Impala, Teradata.
Confidential
Teradata/ETLDeveloper
Responsibilities:
- Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying teh data sources required for teh requests.
- Proficient in importing/exporting large amounts of data from files toTeradataand vice versa.
- Developed teh DW ETL scripts using BTEQ, Stored Procedures, and Macros inTeradata.
- Developed scripts for loading teh data into teh base tables in EDW using FastLoad, MultiLoad and BTEQ utilities ofTeradata
- Created numerous scripts withTeradatautilities BTEQ, MLOAD and FLOAD.
- Highly experienced in Performance Tuning and Optimization for increasing teh efficiency of teh scripts.
- Developed reports using theTeradataadvanced techniques like rank, row number
- Worked on Data Verifications and Validations to evaluate teh data generated according to teh requirements is appropriate and consistent.
- Tested database to check field size validation, check constraints, stored procedures and cross verifying teh field size defined within teh application with metadata.
- Proficient in working on Set, Multiset, Derived, Volatile Temporary tables.
- Designed and developed weekly, monthly reports related to teh marketing and financial departments usingTeradataSQL.
- Extracted data from existing data source and performed ad-hoc queries.
- Created Complex mappings using Unconnected, Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
- Created Mapplet and used them in different Mappings.
- Created events and tasks in teh work flows using workflow manager.
Environment: Informatica8.6.1, TeradataV12,BTEQ, MLOAD, FLOAD,TPUMP, TPT,ORACLE, SQL, PLSQL, UNIX, Windows XP.
Confidential
Software Engineer
Responsibilities:
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying teh existing service layer based on teh business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created User Interface using JSF.
- Involved in integration testing teh Business Logic layer and Data Access layer.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display teh value of variables defined in configuration files.
- Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier.
- Involved in JUnit testing of teh application using JUnit framework.
- Written Stored Procedures functions and views to retrieve teh data.
- Used Maven builds to wrap around Ant build scripts.
- CVS tool is used for version control of code and project documents.
- Responsible to mentor/work with team members to make sure teh standards and guidelines are followed and
Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, Web Logic Workshop and CVS.