Big Data Engineer Resume
Phoenix, AZ
SUMMARY:
- 4 years of IT industry experience encompassing wide range of skill set.
- 3+ years of experience in working with Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera and Hortonworks hadoop distributions.
- Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop and Spark.
- Experience on running Oozie jobs daily, weekly or bi - monthly as needed for the business which will run in MapReduce way.
- Executed complex HiveQL queries for required data extraction from Hive tables.
- Experience on developing MapReduce jobs for data cleaning and data manipulation as required for the business.
- Experience on Administering and Monitoring of Hadoop Cluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.
- Good knowledge on NoSQL Databases including HBase.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera.
- Involved in maintaining and analyzing large data sets of memory in Terabytes efficiently.
- Succeeded in running Spark on YARN cluster mode which can make performance faster.
- Monitoring Map Reduce jobs and YARN applications.
- Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL.
- Experienced in SDLC, Agile and also Hybrid Methodology.
- Ability to meet deadlines without comprising in delivering right output.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.
- Ability to quickly adapt new environment and technologies.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hadoop, Hive, Oozie, Zookeeper, Impala, Sqoop, MapReduce, Tez, Impala, Spark, Flume, HBase, Solr, Kafka, YARN, Pig, Avro, Storm, KUDU
Distributions: Cloudera, Horton works
Monitoring Tools: Cloudera Manager, Ambari
Programming Languages: Python, SQL, Java, HiveQL, Shell Scripting, Scala, R
Databases: NoSQL (HBase, MapR-DB, MongoDB), Oracle 12c/11g, MySQL, DB2, MS SQL Server
Operating Systems: Windows, Linux (RHEL, CentOS, Ubuntu)
PROFESSIONAL EXPERIENCE:
Confidential , Phoenix, AZ
Big Data Engineer
- Architected, Installed and maintained a Hadoop cluster on a large scale distributed system
- Participated in data collection, data cleaning, data mining, developing models and visualizations.
- Developing code to ingest data into HDFS using Sqoop, Spark and Scala.
- Modified stored procedures to suitable HQL code to run on Impala.
- Developed data pipeline utilizing technologies like Spark, Sqoop and Oozie.
- Developed oozie scripts to orchestrate the sqoop jobs for exporting the data from different data Sources like Oracle, MySQL and MSSQL into HDFS and Hive.
- Developed queries and scripts for data transformation in Hive warehouse.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing customer behavioral data.
- Developing reports on QlikSense.
- Ingested data from relational database using sqoop and spark into HDFS and loading them into Hive tables and transforming large sets of data and analyzed them by running Hive queries and Spark.
- Working closely with Artificial Intelligence team.
Environment: Hortonworks, Hive,Sqoop, Spark, SparkSQL Zookeeper, Oozie, HBase, Scala, Eclipse,Pig, Datastage, Kafka, Linux and R.
Confidential, Bellevue, WA
Big Data Developer
Responsibilities:
- Worked on Hadoop eco systems including Hive, HBase, Oozie, Zookeeper, Spark Streaming.
- Spark gets all the streaming inputs from online services like online searches, requests, applications, clicks for understanding behavior of visitors, where all the data is stored, transformed and analyzed using Hadoop platform.
- Responsible for developing Spark Streaming jobs, MapReduce jobs and also involved in installations, upgrades, providing a large-scale data lake to store, analyze, transform the data as required for the business.
- Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
- Involved in developing the Spark Streaming jobs by writing RDD’s and developing data frame using SparkSQL as needed.
- Involved in running the Oozie jobs daily, weekly, bi-monthly as required to know about the storage and for capacity planning.
- Implemented partitioning and bucketing of data in Hive for fast performance.
- Developed the external tables in Hive which can be used for obtaining required data for analysis by writing HiveQL queries.
- Experience working with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
- Used Apache Spark on YARN to have fast large scale data processing and to increase performance.
- Able to tackle the problems and accomplished the tasks which should be done during the sprint.
Environment: HDFS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Talend, Spark, SparkSQL Zookeeper, Oozie, HBase, Pig, Datastage, Kafka, Eclipse, python, Linux, AWS, Cloudera.
Confidential, Bloomfield, CT
Big Data Developer
Responsibilities:
- Worked with Hadoop, Hive, HBase, HDFS, MapReduce, Oozie, Kafka, Pig, Solr, Hadoop cluster administration, Big-SQL and Spark. Sports Authority has a large data coming in and data analytics play important role so it became the vendor for Hadoop and developed distribution with key features required for its client’s needs My role was to plan the capacity on HDFS, data ingestion, ETL processing and develop MapReduce jobs in Hive, Pig.
- Worked on with importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop.
- Experience on Spark and Kafka to get stream of data.
- Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
- Written HiveQL queries on the hive table and developed external tables required for ETL and generated reports from the data which are very useful for analysis.
- Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Experience working on processing unstructured data using Pig and Hive.
- Worked on Apache Solr which is used as indexing and search engine.
- Actively involved in code review, troubleshooting the issues and bug fixing for improving the performance.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, Sqoop, Kafka, Spark, Oozie, HBase, Pig, Data Stage, JAVA and Red Hat Enterprise Linux, python, Talend.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing and transforming the data using different big data tools including Hive, and MapReduce.
- Involved in increasing the performance of system by adding other real time components like Flume, Storm to the platform.
- Installed and configured Storm, Flume, Zookeeper on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed Map Reduce Programs in JAVA for data analysis and data cleaning.
- Involved in defining job flows and running data streaming jobs to process terabytes of text data.
- Worked with Apache Crunch library to write, test and run MapReduce pipeline jobs.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Worked on Impala for obtaining fast results without any transformation of data.
- Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Tableau for visualizing and analyzing the data.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.x, Sqoop, Kafka, Storm, Oozie, HBase, Cloudera Manager, Crunch, Tableau, Linux.
Confidential
Software Engineer
Responsibilities:
- Involved in analysis estimation and development of the project
- Used Struts MVC framework to enable the interaction between JSP/View layers.
- Involved in the client meeting and call individually and with them.
- Status reporting (weekly) on project progress.
- Preparing release notes and taking care of all deployment activities.
- Involved in XML parsing coding.
- Responsible for build process to deploy the applications.
- Involved in creating patches, Code Base and Sanity Code checking’s for code Release to client.
- Completion of development on time within scheduled plan.
- Doing QA for testing of requirement done.
- Did testing (unit testing, System Integration testing and regression testing)
- Code review and given the Technical support to the Team.
- Involved in writing test case for unit testing, functional testing, integration testing.
Environment: J2EE( Servlets, JSP, Struts Frame Work), JSTL, Search Engine(Autonomy), MySql, AMS(Access Management System), Tomcat 5.5, Apache solr, Eclipse IDE, SQL Query Browser