Hadoop Developer Resume Reston, VA - Hire IT People

PROFESSIONAL SUMMARY:

8 Years of extensive experience including 4 years of Big Data and on E - Commerce, Healthcare domains and 4 years on Software Development Experience in ETL Informatica.
Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, SQOOP, and Impala.
Having hands on experience in writing Map Reduce jobs in Hive, Pig.
Having experience on importing and exporting data from different systems to Hadoop file system using
SQOOP. Using Hadoop ecosystem components for storage and processing data.
Having experience on creating databases, tables, and views in HIVEQL, IMPALA and PIG LATIN.
Strong knowledge on Map Reduce concepts
Around 1year experience on Spark and Scala.
Hands on Experience in working with ecosystems like Hive, Pig, Map Reduce.
Strong Knowledge of Hadoop, Hive, and Hive analytical functions.
Efficient in building map reduce programs using Hive and Pig.
Involved in data migration to implement on Hadoop stack from different databases (SQL Server2008 R2, Oracle, and MYSQL).
Successfully loaded files to Hive and HDFS from MYSQL.
Loaded the dataset into Hive for ETL Operations.
Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
Strong Communication skills of written, oral, interpersonal and presentation.
Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
Extensive work experience with different SDLC approaches such as Waterfall and Agile development methodologies.
Good communication and presentation skills.
Ability to identify and resolve problems both independently and quickly.
Moving data from HDFS to RDBMS and vice-versa using SQOOP.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Implemented Commissioning and Decommissioning of new nodes to existing cluster.
Analyzing/Transforming data with Hive and Pig.

TECHNICAL SKILLS:

Skills: APACHE HADOOP HDFS, Hadoop, Hadoop Distributed File System, Oracle, SQL, Data warehouse, Informatica, unix

Big Data Hadoop Skills: HDFS, YARN, SQOOP, Flume, PIG and Hive SPARK: Spark Core, Spark Streaming, Spark SQL, NoSQL HBase.

Programming Language: Java

Analytics Tools: Informatica, RDBMS, Oracle

WORK EXPERIENCE:

Hadoop Developer

Confidential - Reston, VA

Responsibilities:

Analyze large datasets to provide strategic direction to the company.
Involved in analyzing the system and business.
Developed SQL statements to improve back-end communications.
Loaded unstructured data into Hadoop File System (HDFS).
Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
Created reports and dashboards using structured and unstructured data.
Involved in importing data from MySQL to HDFS using SQOOP.
Involved in writing Hive queries to load and process data in Hadoop File System.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
Involved in working with Impala for data retrieval process.
Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
Sentiment Analysis on reviews of the products on the client's website.
Exported the resulted sentiment analysis data to Tableau for creating dashboards.

Environment: Cloudera, CDH4.3, Hadoop, Map Reduce, HDFS, Hive, MangoDB, SQOOP, MYSQL, SQL, Impala, Tableau.

Hadoop Developer

Confidential - Boston, MA

Responsibilities:

Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Shared responsibility for administration of Hadoop, Hive, and Pig.
Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Understand clearly the business requirements of the client with respect to the risk rating modules and report modules.
Working in the Cluster Setup 2-node and 5-node clusters with CDH3 distribution.
Involved in the data prediction analysis using K-Mean algorithm.
Coordinate discussions with customer and functional team as may be required to get various inputs.
Work closely with the technology counterparts in communicating the business requirements.
Application design and database design.
Technical design document preparation.

Environment: Java, Machine learning, Cloud Era, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, SCALA, Git.

Hadoop Developer

Confidential - San Francisco, CA

Responsibilities:

To lead the Big Data Analytics solution project to load the data from Source all through into Client's Modern Analytics Platform.
Analyze and Ingest Policy, Claims, Billing and Agency Data in Client's Solution which is done through multiple stages.
Written multiple Map Reduce programs to extract data for extraction, transformation, and aggregation from different sources having multiple file formats including XML, JSON, CSV &other compressed file formats.
Assisted with data capacity planning and node forecasting.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated the SQOOP jobs by scheduling in Oozie.
Create Hive scripts to load data from one stage into another and implemented incremental load with the changed data architecture.
The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
Performed data analysis, queries on hive, pig on AMBARI(Hortonworks)
Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
Implemented Hive partitioning and bucketing to improve query performance in the Staging layer which is de-normalized form of the Analytics Model.
Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
Issued SQL queries via Impala to process the data stored in HDFS and HBASE.
Plan and review the deliverables. Assist the team in their development & deployment activities.
Involved in cluster setup meetings with the administration team.

Environment: Apache Hadoop 2.2.0, Hortonworks, MapReduce, Hive, Hbase, HDFS, PIG, Sqoop, Flume, Impala, Spark, Oozie, Kafka, MongoDB, UNIX, Shell Scripting, XML, JSON.

Jr.Hadoop Developer

Confidential - MEMPHIS, TN

Responsibilities:

Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from web logs and store in HDFS.
Involved in developing Hive UDFs for the needed functionality.
Involved in creating Hive tables, loading with data and writing Hive queries.
Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with a Solar search engine.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Used pig to do transformations, event joins filter boot traffic and some pre-aggregations before storing the data onto HDFS.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like a spark.
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
Extending Hive functionality by writing custom UDFs.
Experience in managing and reviewing Hadoop log files
Developed data pipeline using Flume, Sqoop, pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Involved in emitting processed data from Hadoop to relational databases and external file systems using Sqoop.
Orchestrated hundreds of Sqoop scripts, pig scripts, Hive queries using Oo zie workflows and sub-workflows.
Loaded cache data into HBase using Sqoop.
Experience in custom Talend jobs to ingest, enrich and distribute data in MapR, Cloudera Hadoop ecosystem.
Created lots of external tables on Hive pointed to HBase tables.
Analyzed HBase data in Hive by creating externally partitioned and bucketed tables.
Worked with cache data stored in Cassandra.
Injected the data from External and Internal Flow Organizations.
Used the external tables in Impala for data analysis.
Supported MapReduce Programs those are running on the cluster.
Participated in apache Spark POCS for analyzing the sales data based on several business factors
Participated in daily scrum meetings and iterative development.

Environment: s: Hadoop, MapReduce, Hdfs, Pig, Hive, HBase, Impala, Sqoop, Oozie, Apache Spark, Java, Linux, SQL Server, Zookeeper, Tableau.

Informatica Developer

Confidential, Louisville, KY

Responsibilities:

Developed ETL programs using Informatica to implement the business requirements.
Communicated with business customers to discuss the issues and requirements.
Created shell scripts to fine tune the ETL flow of the Informatica workflows.
Used Informatica file watch events to pole the FTP sites for the external mainframe files.
Production Support has been done to resolve the ongoing issues and troubleshoot the problems.
Performance tuning was done at the functional level and map level. Used relational SQL wherever possible to minimize the data transfer over the network.
Effectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections, and relational connections.
Involved in enhancements and maintenance activities of the data warehouse including tuning, modifying of stored procedures for code enhancements.
Effectively worked in Informatica version based environment and used deployment groups to migrate the objects.
The used debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
Pre-and post-session assignment variables were used to pass the variable values from one session to other.
Designed workflows with many sessions with the decision, assignment task, event wait, and event raise tasks, used the informatic scheduler to schedule jobs.
Reviewed and analyzed functional requirements, mapping documents, problem-solving and trouble shooting.
Performed unit testing at various levels of the ETL and actively involved in team code reviews.
Identified problems in existing production data and developed one-time scripts to correct them.
Fixed the invalid mappings and troubleshoot the technical problems of the database.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Reston, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship