Sr Big Data Engineer Resume
Raritan, NJ
SUMMARY
- 8 years of IT industry experience encompassing wide range of skill set.
- 5 years of experience in working wif Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera and Hortonworks hadoop distributions.
- Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop and Spark.
- Experience on running Oozie jobs daily, weekly or bi - monthly as needed for the business which will run in MapReduce way.
- Executed complex HiveQL queries for required data extraction from Hive tables.
- Experience on developing MapReduce jobs for data cleaning and data manipulation as required for the business.
- Experience on Administering and Monitoring of Hadoop Cluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.
- Good noledge on NoSQL Databases including HBase.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera.
- Involved in maintaining and analyzing large data sets of memory in Terabytes efficiently.
- Succeeded in running Spark on YARN cluster mode which can make performance faster.
- Monitoring Map Reduce jobs and YARN applications.
- Extensive experience in working wif Oracle, MS SQL Server, DB2, MySQL.
- Experienced in SDLC, Agile and also Hybrid Methodology.
- Ability to meet deadlines wifout comprising in delivering right output.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.
- Ability to quickly adapt new environment and technologies.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hadoop, Hive, Oozie, Zookeeper, Impala, Sqoop, MapReduce, Tez, Impala,Spark, Flume, HBase, Solr, Kafka, YARN, Pig, Avro, Storm, KUDU
Distributions: Cloudera, Hortonworks
Monitoring Tools: Cloudera Manager, Ambari
Programming Languages: Python, R, SQL, Java, HiveQL, Shell Scripting, Scala
Databases: NoSQL (HBase, MapR-DB, MongoDB), Oracle 12c/11g, MySQL, DB2, MS SQL Server
Operating Systems: Windows, Linux (RHEL, CentOS, Ubuntu)
PROFESSIONAL EXPERIENCE
Confidential, Raritan, NJ
Sr Big Data Engineer
Responsibilities:
- Leading a team of seven people which includes offshore team to migrate data from Oracle database to HDFS and KUDU.
- Developing code to ingest data into HDFS and KUDU using Sqoop, Spark and Scala.
- Modified stored procedures to suitable HQL code to run on Impala.
- Developed data pipeline utilizing technologies like Spark, Sqoop and Oozie.
- Developing reports on QlikSense.
- Working closely wif Artificial Intelligence team.
Environment: Hortonworks, Hive, Sqoop, Spark, SparkSQL Zookeeper, Oozie, HBase, Scala, Eclipse, Pig, Datastage, Kafka, Linux and R.
Confidential, Portland, OR
Big Data Engineer
Responsibilities:
- me'm primarily responsible for building a Hadoop Data Lake dat integrates data from clinical, omics, imaging and streaming data sources for research purposes wif regulatory requirements such as HIPAA and PHI.
- Architected, Installed and maintained a Hadoop cluster on a large scale distributed system
- Participated in data collection, data cleaning, data mining, developing models and visualizations.
- Developed oozie scripts to orchestrate the sqoop jobs for exporting the data from different data Sources like Oracle, MySQL and MSSQL into HDFS and Hive.
- Developed queries and scripts for data transformation in Hive warehouse.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing customer behavioral data.
- Ingested data from relational database using sqoop and spark into HDFS and loading them into Hive tables and transforming large sets of data and analyzed them by running Hive queries and Spark.
Environment: Hortonworks, Hive,Sqoop, Spark, SparkSQL Zookeeper, Oozie, HBase, Scala, Eclipse,Pig, Datastage, Kafka, Linux and R.
Confidential, Fort Collins, CO
Big Data Engineer
Responsibilities:
- Worked on Hadoop eco systems including Hive, HBase, Oozie, Zookeeper, Spark Streaming.
- Spark gets all the streaming inputs from online services like online searches, requests, applications, clicks for understanding behavior of visitors, where all the data is stored, transformed and analyzed using Hadoop platform.
- Responsible for developing Spark Streaming jobs, MapReduce jobs and also involved in installations, upgrades, providing a large-scale data lake to store, analyze, transform the data as required for the business.
- Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
- Involved in developing the Spark Streaming jobs by writing RDD’s and developing data frame using SparkSQL as needed.
- Involved in running the Oozie jobs daily, weekly, bi-monthly as required to no about the storage and for capacity planning.
- Implemented partitioning and bucketing of data in Hive for fast performance.
- Developed the external tables in Hive which can be used for obtaining required data for analysis by writing HiveQL queries.
- Experience working wif Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for dis purpose.
- Used Apache Spark on YARN to has fast large scale data processing and to increase performance.
- Able to tackle the problems and accomplished the tasks which should be done during the sprint.
Environment: HDFS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Talend, Spark, SparkSQL Zookeeper, Oozie, HBase, Pig, Datastage, Kafka, Eclipse, python, Linux, AWS, EMR, Cloudera.
Confidential, Englewood, CO
Big Data Developer
Responsibilities:
- Worked wif Hadoop, Hive, HBase, HDFS, MapReduce, Oozie, Kafka, Pig, Solr, Hadoop cluster administration, Big-SQL and Spark. Confidential has a large data coming in and data analytics play important role so it became the vendor for Hadoop and developed distribution wif key features required for its client’s needs My role was to plan the capacity on HDFS, data ingestion, ETL processing and develop MapReduce jobs in Hive, Pig.
- Worked on wif importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop.
- Experience on Spark and Kafka to get stream of data.
- Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
- Written HiveQL queries on the hive table and developed external tables required for ETL and generated reports from the data which are very useful for analysis.
- Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Experience working on processing unstructured data using Pig and Hive.
- Worked on Apache Solr which is used as indexing and search engine.
- Experience on Big-SQL which is interactive SQL engine wif low latency and which is very useful for business.
- Actively involved in code review, troubleshooting the issues and bug fixing for improving the performance.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, Sqoop, Kafka, Spark, Lucene, Pentaho Oozie, HBase, Big SQL, Pig, Data Stage, JAVA and Red Hat Enterprise Linux, python,Talend,IBM Infosphere Datastage, AWS, EMR.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing and transforming the data using different big data tools including Hive, and MapReduce.
- Involved in increasing the performance of system by adding other real time components like Flume, Storm to the platform.
- Installed and configured Storm, Flume, Zookeeper on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed Map Reduce Programs in JAVA for data analysis and data cleaning.
- Involved in defining job flows and running data streaming jobs to process terabytes of text data.
- Worked wif Apache Crunch library to write, test and run MapReduce pipeline jobs.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Worked on Impala for obtaining fast results wifout any transformation of data.
- Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Tableau for visualizing and analyzing the data.
- Experience on using Solr search engine which can be used for indexing and searching the data.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.x, Sqoop, Kafka, Storm, Oozie, HBase, Cloudera Manager, Crunch, Tableau, Linux.
Confidential
Software Engineer
Responsibilities:
- Developed fixes for database upgrade issues.
- Developed procedures and views for BI team while designing client specific reports
- Restored dev and qa environment databases wif updated data.
- Generated reports using Crystal Reports.
- SME (Subject Matter Expert) in application testing and halped team in drafting and revising test plans and test scripts