Big Data Engineer Resume
St Louis, MO
SUMMARY:
- Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms.
- Above 9 years of experience as Big Data/Hadoop with skills in analysis, design, development, testing and deploying various software applications.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experience in developing custom UDF's for Pig and ApacheHive to in corporate methods and functionality of Java into PigLatin and HiveQL.
- Good experience in developing MapReduce jobs in J2EE /Java for datacleansing, transformations, pre-processing and analysis.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of TeradataBigData Analytics.
- Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Strong exposure to Web2.0 client technologies using JSP, JSTL, XHTML, HTML5, DOM, CSS3, JavaScript and AJAX.
- Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef and Jenkins
- Extensive experience on developing SparkStreaming jobs by developing RDD's (Resilient Distributed Datasets) and used SparkSQL as required.
- Experience on developing JAVAMapReduce jobs for datacleaning and data manipulation as required for the business.
- Strong knowledge on Hadoopeco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScriptMVC patterns, ObjectOrientedJavaScriptDesign Patterns and AJAX calls.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
TECHNICAL SKILLS:
- Hadoop/Big Data Technologies: Hadoop 2.7/2.5, HDFS, MapReduce, HBase 1.2.4, Pig, Hive 2.0, Hue, Sqoop, Spark2.0/2.0.2, Impala, Oozie, YARN, Flume 1.7, Kafka, Zookeeper
- Hadoop Distributions: Cloudera 5.9, Hortonworks, MapR
- Programming Language: Java, Scala, Python 3.5, SQL, PL/SQL, Shell Scripting, Storm, JSP, Servlets
- Frameworks: Spring 4.3, Hibernate, Struts, JSF, EJB, JMS
- Web Technologies: HTML, CSS, JavaScript, JQuery, Bootstrap, XML, JSON, AJAX
- Databases: Oracle 12c/11g, SQL Server2016/2014, MYSQL5.7/5.4.16
- Database Tools: TOAD, SQL PLUS, SQLite 3.15/3.15.2
- Operating Systems: Linux, Unix, Windows 8/7
- IDE and Tools: Eclipse 4.6, Netbeans 8.2, IntelliJ, Maven
- NoSQL Databases: HBase, Cassandra, MongoDB, Accumulo
- Web/Application Server: Apache Tomcat, Jboss, Web Logic, Web Sphere
- SDLC Methodologies: Agile, Waterfall
- Version Control: GIT, SVN, CVS
PROFESSIONAL EXPERIENCE
Confidential, St. Louis, MO
Big Data Engineer
Responsibilities:
- Performed data transformations like filtering, sorting, and aggregation using Pig.
- Creating Sqoop queries to import data from SQL, Oracle, and Teradata to HDFS.
- Created Hive tables to push the data to Mongo DB.
- Wrote complex aggregate queries in mongo for report generation.
- Developed scripts to run scheduled batch cycles using Oozie and present data for reports.
- Worked on a POC for building a movie recommendation engine based on Fandango ticket sales data using Scala and Spark Machine Learning library.
- Developed bigdata ingestion framework to process multi TB data including data quality checks, transformation, and stored as efficient storage formats like parquet and loaded into AmazonS3 using SparkScala API and Spark.
- Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoop streaming, Apache Spark, Spark SQL, Scala, Hive, and Pig.
- Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquet file format.
- Worked on AWS environment for development or deployment of Hadoop applications.
- Performed data validation and transformation using Python and Hadoop streaming.
- Developed highly efficient PigJava UDFs utilizing advanced concept like Algebraic and Accumulator interface to populate ADP Benchmarks cube metrics.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
- Developed bash scripts to bring the TLOG file from ftp server and then processing it to load into hive tables.
- Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment...
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Designed, developed and maintained Big Data streaming and batch applications using Storm.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC/Parquet file format and Snappy compression.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts.
Environment: Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Pig, Sqoop, AWS,Oozie, DB2, Java, Python, Splunk, UNIX Shell Scripting.
Confidential, New Jersey
Big Data Developer
Responsibilities:
- As a Big Data Developer, I worked on Hadoop eco-systems including Hive, HBase, Oozie, Yarn, Spark Streaming MCS (MapR Control System) and so on with MapR distribution.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Primarily involved in Data Migration process using Azure by integrating with Bitbucket repository and Jenkins.
- Built code for real time data ingestion using Java, MapR-Streams (Kafka).
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Worked on analyzing Hadoop stack and different Big data tools including Pig and Hive, Hbase database and Sqoop.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.
- Used J2EE design patterns like Factory pattern & Singleton Pattern.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, Impala and loaded final data into HDFS.
- Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
- Experienced in designing and developing POC's in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Responsible for coding MapReduce program, Hive queries, testing and debugging the MapReduce programs.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Used RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in loading data from UNIX file system to HDFS. Involved in designing schema, writing CQL's and loading data using Cassandra.
- Built the automated build and deployment framework using Jenkins, Maven etc.
Environment: Spark, HDFS, Kafka, MapReduce (MR1), Pig, Hive, Sqoop, Cassandra, AWS, Talend, Java, Linux Shell Scripting
Confidential, Atlanta, GA
Big data/HadoopDeveloper
Responsibilities:
- Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
- Responsible for developing prototypes the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analysing and visualizing large sets of data using multiple platforms.
- Understand how to apply technologies to solve big data problems and to develop innovative big data solutions.
- Developed Spark Applications by using Scala , Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra .
- Responsible for analysing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS .
- Developed Simple to complex Map Reduce Jobs using Hive and Pig .
- Performed importing data from various sources to the Cassandra cluster using Sqoop . Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra .
- Worked in Spark and Scala for Data Analytics . Handle ETL Framework in Spark for writing data from HDFS to Hive .
- Used Scala based written framework for ETL .
- Developed multiple spark streaming and core jobs with Kafka as a data pipe-line system
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS .
- Migrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets processing and storage.
- Imported data from AWSS3 into SparkRDD , Performed transformations and actions on RDD's .
- Extensively use Zookeeper as job scheduler for Spark Jobs .
- Worked on Talend with Hadoop . Worked in migrating from Informatica Talendjobs .
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Developed Kafka producer and consumer components for real time data processing.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins , Views and Partitions .
- Involved in CassandraDatamodeling to create key spaces and tables in multi Data Centre DSECassandra DB.
Environment: Spark, HDFS, Kafka, MapReduce (MR1), Pig, Hive, Sqoop, Cassandra, AWS, Talend, Java, Linux Shell Scripting.
Confidential, Chicago, IL
Big Data/HadoopDeveloper
Responsibilities:
- Performed data transformations like filtering, sorting, and aggregation using Pig
- Creating Sqoop to import data from SQl, Oracle, and Teradata to HDFS
- Created Hive tables to push the data to MongoDB.
- Wrote complex aggregate queries in mongo for report generation.
- Developed scripts to run scheduled batch cycles using Oozie and present data for reports
- Worked on a POC for building a movie recommendation engine based on Fandango ticket sales data using Scala and SparkMachineLearning library.
- Developed big data ingestion framework to process multi TB data including data quality checks, transformation, and stored as efficient storage formats like parquet and loaded into AmazonS3 using SparkScalaAPI and Spark.
- Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
- Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquetfile format.
- Performed data validation and transformation using Python and Hadoop streaming.
- Developed highly efficient PigJavaUDFs utilizing advanced concept like Algebraic and Accumulator interface to populate ADP Benchmarks cube metrics.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
- Developed bash scripts to bring the TLOG file from ftp server and then processing it to load into hive tables.
- Automated workflows using shell scripts and Control-M jobs to pull data from various databases into HadoopDataLake.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment...
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Designed, developed and maintained Big Data streaming and batch applications using Storm.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed pigscripts to transform the data into structured format and it are automated through Oozie coordinators.
- Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts.
Environment: Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Pig, Sqoop, Oozie, DB2, Java, Python, Splunk, UNIX Shell Scripting.