Hadoop Developer. Resume
PROFESSIONAL SUMMARY:
- A Dedicated, Assertive and Qualified Technology Professional working as Hadoop Developer.
- 9 years of overall IT experience in Application Development in Big Data Hadoop and Data stage.
- 5 years of exclusive experience and knowledge in Big Data Hadoop and its components like HDFS, Map Reduce, Apache Pig, Hive, Sqoop, HBase, IMPALA, Flume, OOZIE, SPARK, Spark Streaming, Kafka.
- 2 Years of experience as ETL Developer using Data stage.
- Extensive Experience in Setting Hadoop Cluster.
- Good Working Experience on Hadoop HDFS, Hive, Pig, MR Jobs, Impala, flume and SQOOP and OOZie.
- Having Experience in to Import/Export data from Existing RDBMS.
- Good Knowledge on Oracle 10g, MySQL and NOSQL databases .
- Good Exposure on Query Programming Model of Hadoop (Pig and Hive).
- Good Experience in PySpark .
- Used the JSON and XML SerDe's for serlization and de - serlization to load JSON and XML data into Hive.
- Used Avro, Parquest and ORC data formats to store in to GCP. good knowledge in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.
- Good Knowledge on Hcatalog, Impala, and NoSQL data bases (MongoDB and Hbase).
- In depth understanding/knowledge of hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map-reduce concepts.
- Having Experience on Single node and multi node cluster configurations, Decommission and commissioning of nodes in the cluster.
- Experiencing in designing, developing, documenting and testing of ETL jobs using Data Stage.
- Experienced in integration of various data sources (Oracle, Dataset and sequential files, multi-type flat files) into data staging area.
- Good exposure in handling the stages like sequential files stage, dataset, lookup, Join, Sort, transformer, aggregator, funnel stages.
- Participated in the client calls to gather the requirements and involved in the preparation of design, mapping as well as unit test cases documents.
- Performed various activities like import and export of Data Stage projects.
- Good exposure on data warehousing concepts, UNIX and oracle.
- Excellent communication, interpersonal, analytical skills, and strong ability to perform as part of team.
- Exceptional ability to learn new concepts.
- Willing to walk an extra mile to achieve excellence.
- Extensive experience in high-end technical areas including SIEBEL Configuration, Work flows, script, having good understanding of Siebel architecture.
- Trained in Sales force from the Organization and having good knowledge in implementing the POCs in Sales Force and Sales force certified.
- Learn new technologies and ready to reach the management expectations..
- Having experience to work and involved in few POC implementations.
- Worked on Siebel-CRM and Having POC knowledge in Sales force.
TECHNICAL SKILLS:
Languages: Core Java and Pig Latin, eScript.
Operating Systems: Windows and Linux.
Big Data Technologies: Hadoop, Spark, Hive, PySpark, Pig, Sqoop, HUE, Oozie, Impala, Flink
RDBMS: Oracle 10g and My SQL 5.5.35.
Distributed Database (NoSQL): HBase, MongoDB.
IDE's: Eclipse 3.7.2.
Other Technologies: Sales force, Siebel.
Domain knowledge: Siebel Communications, Call Centre, Financial Services, Banking
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Developer
Responsibilities:
- Requirement analysis,design,development
- Analyse the data from the SOR
- Prepare the Xwalk Documents and bridge documents based on the PDM and data lineage documents to create appid to load the data int Hbase and Gremlin Graph DB.
- Modifying the existing MDFE spark jar to load the Data from CDF and GLIF data sources using Xwalk and BD documents.
- Developed Xwalk and BD for 12 SOR feed files.
- Developing maestro jobs for automation
- Data load validation in hbase and and gremlin using APPID
- Unit testing, defect fixing
Confidential, Los Angeles, CA
Hadoop Developer
Responsibilities:
- Involved in the requirement gathering, project documentation, design document, production deployment
- Developed spark jobs using Java for generic reports.
- Developed Spark jobs using pyspark for phase 2 inventory project.
- Active involvement in the daily requirement clarification calls, scrum calls
- Analysed the Source data table from Oracle and MySQL and imported the data using Sqoop to Hadoop.
- Involved in designing the Data models and processed the data using Impala and writing shell scripts to execute the impala queries based on the conditions.
- Exporting the Data tables back to the MySQL data base.
- Implemented Hive tables and HQL Queries for the reports.
- Converting the MySQL Stored Procedures to Impala ETL jobs for faster performance.
- Unit testing and defect fixing.
- Designed Hadoop Data Lake and developed Data ingestion framework supporting RDBMS and Flat files.
- Migrating back-dated and Ingesting Daily data into the Data Lake from multiple sources.
- Generating the Canned reports on the Daily data using HQL queries and Impala queries and reports handover to business users.
- Developed utility tools using shell scripting, Spark, and python scripting for reconciliation.
- Categorizing the huge volume of data in the data lake based on the business requirement to enhance the generation of canned reports performance and query execution time using Spark.
- Supporting UAT and Production Environments.
- Handling data types effectively between RDBMS to HIVE without any data loss.
Confidential
Hadoop Developer.
Responsibilities:
- Working on the enhancements and change requests for the Confidential phase 1 release using SPARK with Python and defect fixing.
- Analysed the Source data table from Oracle and MySQL and imported the data using Sqoop to Hadoop.
- Involved in designing the Data models and processed the data using Impala and writing shell scripts to execute the impala queries based on the conditions.
- Exporting the Data tables back to the MySQL data base.
- Implemented Hive tables and HQL Queries for the reports.
- Converting the MySQL Stored Procedures to Impala ETL jobs for faster performance.
- Unit testing and defect fixing.
- Implemented POC for the Neilson Using SPARK Streaming and Kafka.
- Implemented the POC on MongoDB.
Confidential
Hadoop Developer
Responsibilities:
- Involved in the Business client calls and requirement analysis and design.
- Actively involved in business requirement discussions from Onsite Counterpart and discussions done with Offshore Team for the implementation and coordinated all project related activities.
- Developed Sqoop scripts to import/export data between HDFS and MySQL Database.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Supported MapReduce Programs those are running on the cluster.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Involved in Developing the Pig scripts for processing data.
- Written Hive queries to transform the data into tabular format and process the results using Hive Query Language.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Analysed the functional specifications.
- Implemented PIG scripts According business rules.
- Implemented Hive tables and HQL Queries for the reports.
- Unit testing and defect fixing.
Confidential
Hadoop Developer
Responsibilities:
- Involved in requirement analysis.
- Extensively involved in installation and configuration of Cloudera distribution of Hadoop, its Name node, Secondary Name node, Job tracker, Task trackers and Data nodes
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Supported Map Reduce Programs those are running on the cluster.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Worked on installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, JVM tuning, map and slots configuration.
- Involved in exploring Hadoop Map-reduce Programming and Cluster configuration and installation.
- Written pig scripts for pre-processing of customer data.
- Developed map reduce programs for batch processing and customer data analysis to suggest the best plan for customer.
- Worked on tuning the performance of Hive and Pig queries.
- Writing java code for custom partitioner and writables.
- Integrated hive with Pentaho for data reports.
- Unit testing of web application and Hadoop programs.
- Involved in bug fixing and testing of the application.
Confidential
Data Stage Developer
Responsibilities:
- Involved in the preparation of Low Level Design Document.
- Actively involved in business requirement discussions from Onsite Counterpart and discussions done with Offshore Team for the implementation and coordinated all project related activities.
- Involved in documenting all the Initial Level Analysis done before designing the Jobs.
- Involved in developing parallel jobs to implement slowly changing dimension2 logic using change data capture from RMW Tables to Data mart Tables.
- Involved in Testing all the developed Staging to RMW Jobs.
- Prepared Unit Test Documents for Data mart and RMW designed Jobs.
- Captured all Supporting Activities and documented the same.
Confidential
Configurator/upgrade and support
Responsibilities:
- Worked on pre upgrade and post upgrade tasks
- Fixing defects as per 7.8 functionalists
- Worked on Data Stage Designer, Manager, Administrator and Director.
- Worked with the Business analysts and the DBAs for requirements gathering, analysis, testing, and metrics and project coordination.
- Involved in extracting the data from different data sources like Oracle and flat files.
- Involved in creating and maintaining Sequencer and Batch jobs.
- Creating ETL Job flow design.
- Used ETL to load data into the Oracle warehouse.
- Created various standard/reusable jobs in Data Stage using various active and passive stages like Sort, Lookup, Filter, Join, Transformer, aggregator, Change Capture Data, Sequential file, Data Sets.
- Tested and fixed defects in List Management module, Event Management Performance tuning.
- Tested Inbound and Outbound web services using Soap UI