We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

2.00/5 (Submit Your Rating)

O Wings Mills, MD

SUMMARY

  • Over 5+ years of experience in the field of IT including five years of experience in Hadoop environment and good object oriented programming skills.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce concepts and setting up standards and processes for Hadoop based application design and implementation.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Experience in working on BDPaaS (BigData Platform as a Service).
  • Experience in installation, configuration and deployment of Big Data solutions.
  • Hands on experience using Kafka, Spark, Cassandra.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Storm, Scala, Impala, Kafka, Yarn, Oozie, and Zookeeper.
  • Expertise in MapReduce programs in HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
  • Analyzed or transformed stored data by writing MapReduce jobs based on business requirements.
  • Experience in developing Pig scripts and Hive Query Language.
  • Experience with Hortonworks Hadoop distribution components and custom packages.
  • Hands on experience in Cloudera, MapR.
  • Having good knowledge on Apache Spark.
  • Worked with heterogeneous structured and unstructured data sets.
  • Expertise in Talend.
  • Hands on experience in Python, Scala and Shell.
  • Strong familiarity with CCAR.
  • Written Hive queries for data analysis and to process the data for visualization.
  • Hands on experience in Web Platform Development.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie and managing Metadata with Big Data.
  • Experience in using Apache Storm for real time processing.
  • Hands on experience with Teradata.
  • Hands on experience in JSON, Avro formats.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Hands on experience working with NoSQL database including Mongodb and HBase.
  • Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
  • Experience in managing and reviewing Hadoop Log files.
  • Expertize in data centric application development.
  • Experience in developing the Pig UDFs to pre - process the data for analysis.
  • Participated in multiple big data POCs to evaluate different architectures, tools and vendor products.
  • Used Zookeeper to provide coordination services to the cluster.
  • Experience in developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Hands on experience in application development using Big Data, Java, RDBMS and Linux/bash Shell Scripting and using application servers like Tomcat, WebLogic and Glassfish.
  • Hands on experience in using version control system Git.
  • Expertise in Web technologies using Core Java, J2EE, Servlets, EJB, JSP, JDBC, Java Beans, and Design Patterns.
  • Expertise in MVC Technologies Struts MVC, Spring MVC, Hibernate and JSF.
  • Experience in developing J2EE applications using various other Open Source tools, Persistence frame work (Hibernate) and implementing JPA (Java Persistence API).
  • Hands on experience with SOAP, REST Web Services.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Experience in Test Driven Development and Test Automation.

TECHNICAL SKILLS

Operating Systems: Windows 8/7/XP, Unix, Ubuntu 13.X, Mac OSX

Hadoop Eco System: Hadoop 1.x/2.x(Yarn), Hortonworks, HDFS, Map Reduce, Mongo, HBase, Hive, Pig, Zookeeper, Sqoop, Oozie, Spark, Storm, Scala, Impala, Flume, Avro, Talend, Eclipse, Cloudera-desktop and SVN

Java Tools: Java, MapReduce, J2EE (JSP, Servlets, EJB, JDBC, JMS, JNDI, RMI), Struts, Springs, Hibernate, AJAX, XSLT, HTML, JavaScript, CSS, Junit, Java, JSP, JSON, AVRO, J2EE, Web Services, DHTML, Javascript, DOM, SAX, JQuery, XML,XSLT

API’s: Servlets, EJB, REST, Java Naming and Directory Interface(JNDI), MapReduce

Development Tools: Eclipse, RAD/RSA (Rational Software Architect), IBM DB2 Command Editor, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access), Open Office Suite (Editor, Calc etc..),VM Ware

Databases: MySQL, SQL, IBM DB2 9.x, Oracle 11g/10g

No SQL Databases: HBase, Cassandra, Mongodb

Servers: SQL Server, Web sphere (WAS) 6.x/7.0, Web Logic 10-12c, Tomcat, Glassfish

Version Control: Git Bash, Bitbucket

Programming Languages: C, C++, Java, Python, Scala

PROFESSIONAL EXPERIENCE

Confidential

Big Data/Hadoop Developer

Responsibilities:

  • Validated data in the datalake by comparing data to the original sources of data.
  • Automated the process of validation thereby reducing the execution time.
  • Developed and implemented Unix shell scripts to get the metadata needed for validation.
  • Designed and performed various Data Quality checks on the source data using Talend and Pig.
  • Actively took part with the Business Analysts team to get the data in their required format.
  • Used HBase snapshots to store huge amounts of data into the datalake.
  • Implemented Unix shell scripts to execute in the background to get large amounts of data/meta data without failure.
  • Actively involved in processing of the data in different domains.
  • Implemented the normalized view to make the relationships between the tables clear.
  • Performed various operations on datalake of MapR cluster which involves moving the data, enriching the data, performing validations.
  • Data analysis on use cases, Customer communication, Incident management, Production support.
  • Maintained MapReduce jobs to ensure the successful maintenance of the MapR Datalake Cluster.
  • Developed and implemented a Unix Shell script which retrieves the metadata of all the hive tables in a database.
  • Ingested large sets of data into Optum’s Datalake’s Hive tables using Datafabric 2.0 Framework from Cirrus Source tables.
  • Performed Data Quality checks.
  • Used Sqoop regularly to import data from Cirrus Source SQL Server to the MapR’s Datalake.
  • Created Hive table structures on the top of HBase table data to have updated records for Datalake Reporting.
  • Designed and implemented Polaris Flattened Views (PFV) framework by de-normalization of the table structure of our Data Source, such that the data of many tables can be retrieved only from few new tables designed by de-normalizing the table structure ensuring the relation between the tables is not altered.
  • Designed and implemented the Polaris Flattened Views ETL Jobs using Talend to maintain, capture and update the data on the basis of History Load and Incremental Load in Development, Test and Production environments.
  • Represented my team with the business for the Polaris Flattened Views (PFVs).
  • Scheduled the Talend Jobs using Talend Administration Center (TAC) for Polaris Flattened Views Framework.
  • Implemented the Error logging to handle the exceptional errors in the Talend Jobs written and implemented.
  • Validated data in Flattened Views by an automated Job in Talend which is scheduled to run on a daily basis ensuring the History Load and Incremental Load are run perfect enough.
  • Configured the Talend Jobs designed for our project in Talend such that they prompt for the context parameters to run the jobs.
  • Created the Flattened Views structure with the concept of de-normalization of Cirrus source data models.
  • Generated reports for the changes happened in the Cirrus Source on a bi-weekly basis by running a macro.
  • Developed, maintained, executed and monitored the Talend Jobs for on demand History Load followed by daily Incremental load process ensuring the Data availability in PFV Tenant in Development, Test and Production environments.
  • Stored huge data sets of the Polaris Flattened Views tables in the MapR’s Hadoop Distributed File System (HDFS).
  • Performed enrichments by developing and implementing Pig UDFs in Java and Python on the data based on the business requirement before storing it into PFV Tenant.
  • Done Query optimization of several Hive queries which helped in performance tuning thereby increasing the efficiency of the job execution.
  • Developed and implemented Java code & program for assessing the impact on all the jobs in our project based on the schema changes done in our upstream databases.

Environment: Hadoop, HDFS, Pig, HBase, Talend, Hive, MapReduce, Java, Python, Sqoop, Linux, Shell Scripting, BigData, MapR, Teradata, Java APIs, SQL, SQL Server.

Confidential

Big-Data/Hadoop Developer

Responsibilities:

  • Designed, written and implemented Pig UDFs (User Defined Functions) to perform various transformations to the data present in HDFS.
  • Designed and performed various Data Quality checks on the source data using Talend and Pig.
  • Designed and implemented Talend Jobs to create data, meta and control files from raw source files for further data processing.
  • Designed and implemented Talend Jobs for pre-processing, enrichment and Provisioning the data.
  • Created Enrichment Engine in Talend which takes raw data as input, performs the given enrichments from the user and gives the enriched output into a pre-defined HDFS location.
  • Designed and executed various simple enrichments to the data using Pig and Java.
  • Used HBase to store huge amounts of data into the datalake.
  • Developed MapReduce code to support the jobs running the cluster.
  • Used Teradata to validate Hive Tables.
  • Ingested and worked with huge datasets using Spark.
  • Developed and executed code for complex enrichments using UNIX shell scripting and Pig in Talend.
  • Used the CCAR (Comprehensive Capital Analysis and Review) framework.
  • Develop solutions that are automated and maintainable from deployment to production using Spark.
  • Used Shell scripting to filter out the data according to the business need.
  • Created External tables in Hive and loaded data into those tables from HDFS MapR cluster.
  • Designed a Talend job to load and store the data into HDFS using Pig and Shell Scripting.
  • Created HBase tables and performed validations on the loaded data into HBase according to the business requirement.
  • Worked on RDDs in Spark.
  • Imported data from SQL Server and Oracle into MapR HDFS using Sqoop.
  • Designed and implemented various Pig UDFs in Java.
  • Using Sqoop, exported data from HDFS to SQL Server for provisioning.
  • Participated in reviewing the code for Pig Scripts, Pig UDFs and Talend jobs.
  • Performed Data Quality checks using Shell Scripting.
  • Actively participated in daily scrums, technical development meetings.
  • Written code to create and update tables in SQL.
  • Performed various operations on datalake of MapR cluster which involves moving the data, enriching the data, performing validations.
  • Worked closely with CCAR team in the project for identification and resolution of issue/dependency by promoting collaboration.
  • Actively participated in Data Acquisition from different sources.
  • Performed SQL Provisioning using Talend which includes two cases: Full refresh and Append.

Environment: Hadoop, HDFS, Pig, HBase, Talend, Hive, MapReduce, Java, Sqoop, Linux, Shell Scripting, BigData, MapR, Teradata, Java APIs, SQL, SQL Server, NoSQL.

Confidential, O Wings mills, MD

Big-Data/Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, MapReduce, Pig and Kafka.
  • Involved in loading data from LINUX file system to HDFS.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Provided L3 engineering support, diagnosed and provided solutions for support requests for Hadoop in the project.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Used Talend for Data Integration.
  • Optimized BigData performance in the cloud using Talend.
  • Used Hortonworks Hadoop distribution components and custom packages.
  • Written MapReduce code in Python.
  • Used Teradata to implement Hadoop in the project.
  • Exported the result set from Hive to MySQL using Shell scripts.
  • Monitored job executions regularly.
  • Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Used Big Data for aggregation and extraction with ETL for Big Data Analytics.
  • Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
  • Importing and Exporting of data from RDBMS to HDFS and vice versa using Sqoop.
  • Analyzed the data using Pig and written Pig scripts by grouping, joining and sorting the data.
  • Used Spark for Bid Data Processing and Apache Storm for real time processing.
  • Used Impala for parallel processing the data in SQL in the cluster.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.
  • Made Data Analysis on the data which comes from Logs, graphs, ERPs.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Actively involved in code review and bug fixing for improving the performance.
  • Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Java and XML.
  • Supported MapReduce programs those are running on the cluster.
  • Extensively used Pig for data cleansing.
  • Used Kafka to publish messages.
  • Used NoSQL database with Cassandra and Mongodb.
  • Computed various metrics using Java, MapReduce to calculate metrics that define user experience, revenue etc.
  • Involved in using Sqoop for importing and exporting data into HDFS.
  • Actively participated in weekly meetings with the technical teams to review the code.

Environment: Hadoop, BigData, HDFS, Pig, Hive, MapReduce, Sqoop, Kafka, Linux, Cloudera, Talend, Big Data, Java APIs, Java collection, SQL, NoSQL, Cassandra, Mongodb, AJAX.

Confidential

Big-Data/Hadoop Developer

Responsibilities:

  • Involved in Installing, Configuring Hadoopecosystem, and Cloudera Manager using CDH3 Distribution.
  • Involved in creating Hive tables, loading the data and writing hive queries that will run internally in MapReduce.
  • Managed Metadata with Big Data.
  • Involved in writing MapReduce jobs.
  • Used Talend for BigData Integration to generate the native code to work with Hadoop and Spark.
  • Used Impala for parallel processing the data.
  • Written MapReduce programs using Python.
  • Performed Data Serialization using Talend.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Used Scala to write MapReduce programs.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or CopyToLocal.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Experienced in managing and reviewing Hadoop log files.
  • Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Experience as Data Engineer SME.
  • Extracted and updated the data intoMongodbusing Mongo import and export command line utility interface.
  • Written Hive queries for data to meet the business requirements.
  • Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
  • Worked on tuning the performance of Pig queries.
  • Supported MapReduce programs those are running on the cluster.
  • Involved in developing Pig Scripts for data change capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Designed and Developed Dashboards using Tableau.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in pivoting the HDFS data from Rows to Columns and Columns to Rows.

Environment: Hadoop, BigData, MapReduce, Mongo, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Talend, Oracle 11g, Core Java, Cloudera, HDFS, Eclipse.

We'd love your feedback!