We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Columbia, SC

SUMMARY

  • Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms.
  • Above 8+ years of experience as Big Data/Hadoop with skills in analysis, design, development, testing and deploying various software applications.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
  • Create firewall rules to access Google Data proc from other machines.
  • Write Scala program for spark transformation in Dataproc.
  • Experience in developing custom UDF's for Pig and ApacheHive to in corporate methods and functionality of Java into PigLatin and HiveQL.
  • Good experience in developing MapReduce jobs in J2EE /Java for datacleansing, transformations, pre-processing and analysis.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of TeradataBigData Analytics.
  • Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig.
  • Strong exposure to Web2.0 client technologies using JSP, JSTL, XHTML, HTML5, DOM, CSS3, JavaScript and AJAX.
  • Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef and Jenkins.
  • Experience working with cloud services like AZURE,GCP,AWS and involved in ETL, Data Integration and Migration.
  • Implemented a CI/CD pipeline with JENKINS, GITHUB, NEXUS, MAVEN and AWS AMI.
  • Extensive experience on developing SparkStreaming jobs by developing RDD's (Resilient Distributed Datasets) and used SparkSQL as required.
  • Experience on developing JAVAMapReduce jobs for data cleaning and data manipulation as required for the business.
  • Strong knowledge on Hadoop Eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Expertise in JavaScript, JavaScriptMVC patterns, ObjectOrientedJavaScriptDesign Patterns and AJAX calls.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Extensive experience in working with Oracle, MSSQLServer, DB2, MySQL.
  • Experience in Text Analytics, Statistical Machine learning and Data Mining, providing solutions to various business problems and generating data visualizations using Python.
  • Experience working with Horton works and Cloudera environments.
  • Good knowledge in implementing various data processing techniques using ApacheHBase for handling the data and formatting it as required.
  • Excellent experience in installing and running various Oozieworkflows and automating parallel job executions.
  • Experience on Spark and SparkSQL, SparkStreaming, SparkGraphX, SparkMlib.
  • Extensively development experience in different IDE like Eclipse, NetBeans, IntelliJ and STS.
  • Strong experience in coreSQL and Restfulwebservices (RWS).
  • Created Data Lakes and data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location. Good experience on Apache Nifi Ecosystem.
  • Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoopcluster.
  • Good experience in Tableau for DataVisualization and analysis on large datasets, drawing various conclusions.
  • Good knowledge of coding using SQL, SQLPlus, T-SQL, PL/SQL, Stored Procedures/Functions.
  • Worked on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA).
  • Well versed working with Relational Database Management Systems as Oracle12c, MSSQL, MySQL Server.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Hands of experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Data Proc and Stack driver.
  • Experience managing a GNU/Linux environment for development, including package management and basic system administration
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team. implementation skills such as in Java, Python, and scripting in Linux environment

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop 2.7/2.5, HDFS, MapReduce,Apache Nifi,HBase 1.2.4, Pig, Hive 2.0, Hue, Sqoop, Spark2.0/2.0.2, Impala, Oozie, YARN, Flume 1.7, Kafka, Zookeeper

Hadoop Distributions: Cloudera 5.9, Hortonworks, MapR

Programming Language: Java, Scala, Python 3.5, SQL, PL/SQL, Linux Shell Scripting, Storm, JSP, Servlets

Frameworks: Spring 4.3, Hibernate, Struts, JSF, EJB, JMS

Web Technologies: HTML, CSS, JavaScript, JQuery, Bootstrap, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL Server2016/2014, MYSQL5.7/5.4.16

Database Tools: TOAD, SQL PLUS, SQLite 3.15/3.15.2

Operating Systems: Linux, Unix, Windows 8/7

IDE and Tools: Eclipse 4.6, Netbeans 8.2, IntelliJ, Maven

NoSQL Databases: HBase, Cassandra, MongoDB, Accumulo

Web/Application Server: Apache Tomcat, Jboss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Columbia, SC

Senior Big Data Engineer

Responsibilities:

  • Performed data transformations like filtering, sorting, and aggregation using Pig.
  • Creating Sqoop queries to import data from SQL, Oracle, and Teradata to HDFS.
  • Created Hive tables to push the data to Mongo DB.
  • Wrote complex aggregate queries in mongo for report generation.
  • Developed scripts to run scheduled batch cycles using Oozie and present data for reports.
  • Worked on a POC for building a movie recommendation engine based on Fandango ticket sales data using Scala and Spark Machine Learning library.
  • Created various Parser programs to extract data from Autosys, XML, Informatica, Java and database views using Scala.
  • Using rest API with Python to ingest Data from and some other site to BIGQUERY.
  • Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server and load it in Bigquery.
  • Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoop streaming, Apache Spark, Spark SQL, Scala, Hive, and Pig.
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquet file format.
  • Performed data validation and transformation using Python and Hadoop streaming.
  • Developed highly efficient PigJava UDFs utilizing advanced concept like Algebraic and Accumulator interface to populate Confidential Benchmarks cube metrics.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
  • Developed bash scripts to bring the TLOG file from ftp server and then processing it to load into hive tables.
  • Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
  • Expertise in snowflake to create and Maintain Tables and views.
  • Open SSH tunnel to Google DataProc to access to yarn manager to monitor spark jobs.
  • Submit spark jobs using gsutil and spark submission get it executed in Dataproc cluster
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Designed, developed and maintained Big Data streaming and batch applications using Storm.
  • Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC/Parquet file format and Snappy compression.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Analyze various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
  • Write Scala program for spark transformation in Dataproc.
  • Using g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket.
  • Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
  • Create firewall rules to access Google Data proc from other machines.

Environment: Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Snowflake, Machine Learning, Pig, Sqoop, Oozie, DB2, Java, Python, Splunk, UNIX Shell Scripting,GCP,GCS,Bigquery,Scala

Confidential - Boise, Idaho

Big Data Developer

Responsibilities:

  • Used Agile methodology in developing the application, which included iterative application development, weekly Sprints, stand up meetings and customer reporting backlogs.
  • Writing technical design document based on the data mapping functional details of the tables.
  • Extracting batch and Real time data from DB2, Oracle, Sql server, Teradata, Netezza to Hadoop (HDFS) using Teradata TPT, Sqoop, Apache Kafka, Apache Storm.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
  • Design and build ETL workflows, leading the efforts of programming data extraction from various sources into Hadoop filesystem, implement end to end ETL workflows using Teradata, SQL, TPT, SQOOP and load to HIVE data stores.
  • Analyze and develop programs by considering the exact logic and the data load type using hadoop ingest processes using relevant tools such as Sqoop, Spark, Scala, Kafka, Unix shell scripts and others.
  • Design the incremental, historical extract logic to load the data from flat files into Massive Event logging Database (MELD) from various servers.
  • Developing Apache Spark jobs for data cleansing and preprocessing.
  • Writing spark programs to improve the performance and optimization of the existing algorithms in Hadoop using spark context, spark-sql, data frame, pair rdd's, spark yarn.
  • Using scala language to write programs for faster testing and processing of data.
  • Writing code and creating hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Automating the ETL tasks and data work flows for the data pipeline of the ingest process through UC4 scheduling tool.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Working on ingestion of structured, semi structured and unstructured data into Hadoop ecosystem using Big Data Tools.
  • Selecting and integrating any Big Data tools and frameworks required to integrate new software engineering tools into existing structures, complete modifications, refactoring, and bug fixes to existing functionality.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Implementing Partitions, Bucketing in Hive code to design both Managed and External tables in Hive to optimize performance.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Involved in complete cycle on migrating physical Linux/Windows machines to cloud (AWS) and test it.
  • Used ORC and Parquet file formats in Hive. experience working with UNIX/LINUX to process large data sets
  • Development of efficient pig and hive scripts with joins on datasets using various techniques.
  • Write documentation of program development, subsequent revisions and coded instructions in the project related GitHub repository.
  • Working closely with Data science team to analyze large data sets to gain an understanding of the data, discover data anomalies by writing the relevant code, and look for ways to leverage data.
  • Assist with the analysis of data used for the tableau reports and creation of dashboards.
  • Participate with deployment teams to implement BI code and to validate code implementation in different environments (Dev, Stage and Production).
  • Deployment support including change management and preparation of deployment instructions.
  • Prepare release notes, validation document for user stories to be deployed to production as part of release.
  • Updating RALLY regularly to reflect the current status of the project at any point of time.

Environment: RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie, TeraData, Oracle SQL, UC4, Kafka, github, Hortonworks data platform distribution, Spark, Scala, UC4.

Confidential - Newark, DE

Big Data Developer

Responsibilities:

  • Coordinating with BI team to gather requirements for various data mining projects.
  • Configured Spark streaming to get ongoing information from the Kafka and stored the stream information to HDFS and HBase.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra.
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store HBase for data access and analysis.
  • Used various Spark Transformations and Actions for cleansing the input data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to HBase.
  • Processed the real time steaming data using Kafka, Flume integrating with Spark streaming API.
  • Consumed JSON messages using Kafka and processed the JSON file using Spark Streaming to capture UI updates.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Worked extensively with Sqoop for importing metadata from MySQL and assisted in exporting analysed data to relational databases using Sqoop.
  • Involved in the Migration from On-premises to Azure Cloud
  • Worked on Hive optimization techniques using joins, sub queries and used various functions to improve the performance of long running jobs.
  • Optimized Hive QL by using execution engine like Spark.
  • Environment: Azure HDInsight,Apache Spark, Apache Kafka, Scala, PyS park, HBase, Hive, Sqoop, Flume, Hadoop, HDFS, Scala, Oozie, MySQL, Oracle 10g

Environment: s: Spark, HDFS, Kafka, MapReduce (MR1), Pig, Snowflake, Machine Learning, Hive, Sqoop, Cassandra, AWS, Talend, Java, Linux Shell Scripting

Confidential - Scottsdale, AZ

Hadoop Developer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Importing and exporting data intoHDFSfrom database and vice versa usingSqoop.
  • Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest behavioral data into HDFS for analysis.
  • UsedMavenextensively for building jar files ofMap Reduceprograms and deployed to Cluster.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Developed suit of Unit Test Cases forMapper, ReducerandDriverclasses usingMR Testinglibrary.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Designed and implemented a Cassandra NoSQL based database that persists high-volume user profile data.
  • Migrated high-volume OLTP transactions from Oracle to Cassandra
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • ModelledHivepartitions extensively for data separation and faster data processing and followedPigandHivebest practices for tuning.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Implemented optimization and performance tuning in Hive and Pig.
  • Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.

Environment: RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie, Mahout,HBase, Hortonworks data platform distribution, Cassandra.

Confidential

Big Data/Hadoop Developer Intern

Responsibilities:

  • Performed data transformations like filtering, sorting, and aggregation using Pig
  • Creating Sqoop to import data from SQl, Oracle, and Teradata to HDFS
  • Created Hive tables to push the data to MongoDB.
  • Wrote complex aggregate queries in mongo for report generation.
  • Developed scripts to run scheduled batch cycles using Oozie and present data for reports
  • Worked on a POC for building a movie recommendation engine based on Fandango ticket sales data using Scala and Spark Machine Learning library.
  • Developed big data ingestion framework to process multi TB data including data quality checks, transformation, and stored as efficient storage formats like parquet and loaded into AmazonS3 using SparkScalaAPI and Spark.
  • Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
  • Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquet file format.
  • Performed data validation and transformation using Python and Hadoop streaming.
  • Developed highly efficient PigJavaUDFs utilizing advanced concept like Algebraic and Accumulator interface to populate Confidential Benchmarks cube metrics.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
  • Developed bash scripts to bring the TLOG file from ftp server and then processing it to load into hive tables.
  • Developed and optimized algorithmic trading system and strategies using Vim and Git in Linux.
  • Automated workflows using shell scripts and Control-M jobs to pull data from various databases into HadoopDataLake.
  • Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
  • Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment...
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Designed, developed and maintained Big Data streaming and batch applications using Storm.
  • Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
  • Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Developed pigscripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts.

Environment: s: Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Pig, Sqoop, Oozie, DB2, Java, Python, Splunk, UNIX Shell Scripting.

We'd love your feedback!