We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Rahway, NJ

PROFESSIONAL SUMMARY:

  • Over 8+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, No SQL and Java/J2EE technologies.
  • Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as Map Reduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka and Oozie.
  • Strong understanding of Hadoop daemons and Map - Reduce concepts.
  • Strong experience in importing-exporting data into HDFS format.
  • Expertise in Java and Scala
  • Experienced in developing UDFs for Hive using Java.
  • Worked with Apache Falcon which is a data governance engine that defines, schedules, and monitors data management policies.
  • Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, Red Shift, and Dynamo DB which provides fast and efficient processing of Big Data.
  • Hands on experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
  • Good experience transformation and storage: HDFS, Map Reduce, Spark
  • Hands on experience in developing SPARK applications using Spark tools like RDDtransformations, Sparkcore, Spark Streaming and Spark SQL.
  • Strong understanding and strong knowledge in No SQL databases like HBase, Mongo DB&Cassandra.
  • Experience in working with Angular 4, Nodejs, Bookshelf, Knex, and Maria DB.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Good skills in developing reusable solution to maintain proper coding standard across different java project.
  • Good knowledge on Python Collections, Python Scripting and Multi-Threading.
  • Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, Java WebService, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
  • Used Pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression.
  • Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
  • Ability to work effectively in cross-functional team environments and experience of providing to business users.
  • Good experience in using Sqoop for traditional RDBMS data pull.
  • Good working knowledge of Flume.
  • Worked with Apache Ranger console to create and manage policies for access to files, folders, databases, tables, or columns.
  • Worked with YarnQueue Manager to allocate queue capacities for different service accounts.
  • Hands on experience on Horton works and Cloudera Hadoop environments.
  • Familiar with handling complex data processing jobs using Cascading.
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Extensive experience in Shell scripting.
  • Leading the testing efforts in support of projects/programs across a large landscape of technologies ( Unix, Angular JS, AWS, sause LABS, Cucumber JVM, Mongo DB, GIT Hub, SQL, No SQL database, API, Java, Jenkins)
  • Testing automation by using Cucumber JVM to develop a world class ATDD process.
  • Setup JDBC connection for database testing using cucumber framework.
  • Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Expertise in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions, Horton works and on Amazon web services (AWS).
  • Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
  • Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.

TECHNICAL SKILLS:

Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.

Big Data technologies: HDFS, Map Reduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, Mongo DB

Web/Distributed Technologies: J2EE, Servlets, JSP, Struts, Hibernate, JSF, JSTL,EJB, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC, STRUTS, Spring, Corba, Java Threads.

Browser Languages/Scripting: HTML, XHTML, CSS, XML, XSL, XSD, XSLT, Java script, HTML DOM, DHTML, AJAX.

App/Web Servers: IBM Web sphere … BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL, UDDI, XML, SOA, JAX-RPC, IBM Web Sphere MQ v5.3, JMS.

Testing &Case Tools: JUnit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.

Build Tools: CVS, Subversion, GIT, Ant, Maven, Gradle, Hudson, Team City, Jenkins, Chef, Puppet, Ansible, Docker.

Scripting Languages: Python, Shell (Bash), Perl, PowerShell, Ruby, Groovy, PowerShell.

Monitoring Tools: Nagios, Cloud Watch, JIIRA, Bugzilla and Remedy.

Databases: NO SQL Oracle, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, Green plum and Mongo DB

Operating systems: Windows, Solaris, Unix, Linux (Red Hat 5.x, 6.x, 7.x'SUSELinux 10), Sun Solaris, Ubuntu, CentOS.

PROFESSIONAL EXPERIENCE:

Confidential, Rahway, NJ

Big Data Engineer

Responsibilities:

  • As a Big Data/Hadoop Developer worked on Hadoop eco-systems including Hive, Mongo DB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed Big Data solutions focused on pattern matching and predictive modelling
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Performed multiple Map Reduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Worked on Mongo DB, HBase databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
  • Developed and designed data integration and migration solutions in Azure.
  • Worked on Proof of concept with Spark with Scala and Kafka.
  • Worked on visualizing the aggregated datasets in Tableau.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured HiveMeta store with MySQL, which stores the metadata for Hive tables.
  • Performance tuning of Hive queries, Map Reduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoopcluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Involved in identifying job dependencies to design workflow for Oozie&YARN resource management.
  • Designed solution for various system components using Microsoft Azure.
  • Worked on data ingestion using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Used Cloudera Manager for installation and management of HadoopCluster.
  • Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.

Environment: Agile, Hadoop 3.0, Pig 0.17, HBase, Sqoop, Azure, Hive 2.3, HDFS, NoSQL, Impala, YARN, PL/SQL, Nifi, XML, JSON, Avro, Spark Kafka, Tableau, MySQL, Apache Flume 2.3

Confidential, Piscataway, NJ

Big Data Engineer

Responsibilities:

  • Loading the data from the different Data sources like (Oracle, SQL Server and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
  • Developed HiveUDF's to bring all the customers emailid into a structured format.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Strom and Kafka queue to send push messages to mobile devices
  • Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Scala and have a good experience in using Spark-Shell and SparkStreaming.
  • Designed, developed and maintained Big Data streaming and batch applications using Storm.
  • Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
  • Developed UDF's using both DataFrames/SQL and RDD in Spark for data Aggregation queries and reverting into OLTP through Sqoop.
  • All the bashscripts are scheduled using Resource Manager Scheduler.
  • Designed appropriate Partitioning/Bucketing schema to allow faster data during analysis using HIVE.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts.
  • Build the Jenkins and support for the code deployment into the production and Fixed the postproduction defects to perform the Map/Reduce code to work as expected.

Environment: : Hadoop, HDFS, Spark, Strom, Kafka, Map Reduce, Hive, Pig, Sqoop, Oozie, DB2, Scala, Python, Splunk, UNIX Shell Scripting.

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Extracted files from DB2 through Kettle and placed in HDFS and processed.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Involved in unit testing using MR unit for MapReduce jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Managing work flow and scheduling for complex map reduce jobs using ApacheOozie.
  • Loading data from multiple sources on AWSS3 cloud storage.
  • Experienced in running Hadoop streaming jobs to process terabytes of Xml format data.
  • Load and transform large sets of structured, semi structured data.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linuxscripts.
  • Used AVRO, Parquet file formats for serialization of data.

Environment: Hadoop Oozie, HDFS, Pig, Hive, MapReduce, AWS S3, Sqoop, LINUX, MRUnit

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

  • Worked on Spark SQL to handle structured data in Hive.
  • Worked in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Worked on migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Worked on complex MapReduce program to analyses data that exists on the cluster.
  • Analyzed substantial data sets by running Hive queries and Pigscripts.
  • Written HiveUDF to sort Structure fields and return complex data type.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Creating files and tuned the SQL queries in Hive utilizing HUE (Hadoop User Experience).
  • Worked on collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
  • Worked on Tableau to build customized Interactive reports, Worksheets and dashboards.
  • Managed real-time dataprocessing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Developed Sparkscripts by using Pythonshell commands.
  • Stored the processed results In DataWarehouse, and maintaining data using Hive
  • Experienced in working with Spark eco system using SparkSQL and Scala queries on different formats like Text file, CSVfile.
  • Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3,and EMR.
  • Developed a Proof of Concept which uses ApacheNifi for ingestion of data from the Kafka, to perform the conversion of Raw XML data into JSON, AVRO and implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.

Environment: Cloudera, HDFS, Map Reduce, Storm, Hive, Pig, SQOOP, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, NIFI, Git, Maven.

Confidential

Java Developer

Responsibilities:

  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using SpringFramework.
  • Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
  • Developed Webmodule using SpringMVC, JSP
  • Developing model logic by using HibernateORM framework and handled server-side validations.
  • Involved in Bug fixing.
  • Developed the application using Spring MVC architecture.
  • Developed JSP custom tags support custom User Interfaces.
  • Experienced in MS SQL Server 2005, writing Stored Procedures, SSIS Packages, Functions, and Triggers & Views.
  • Developed front-end pages using JSP, HTML and CSS
  • Developed SQL queries using MySQL and established connectivity
  • Developed Stored Procedures, Triggers, Views, and Cursors using SQL Server 2005.
  • Used Stored Procedures for performing different database operations
  • Used Hibernate for interacting with Database.
  • Used Exception Handling for handling exceptions.
  • Designed sequence diagrams and use case diagrams for proper implementation.

Environment: JDK, JSP, HTML, CSS, JavaScript, MySQL, Spring, Hibernate, MySQL Server, Exception Handling, UML, Rational Rose.

Confidential

Java Developer

Responsibilities:

  • Setting up environment such as deploying application, maintaining Web server.
  • Analyzing the Defects and providing fix.
  • Developed Servlets and JavaServer Pages (JSP), to route the submittals to the EJB components and JavaScripting handled the Front-end validations.
  • Wrote KornShell build scripts for configuring and deployment of GUI application on UNIX machines.
  • Created Session Beans and controller Servlets for handling HTTP requests from JSP pages.
  • Involved in the development of test cases for the testing phase.
  • Designed and developed XSL style sheets using XSLT to transform XML and display the Customer Information on the screen for the user and for processing.
  • Extensively used Clear Case, the version control tool.
  • Performed End to end integration testing of online scenarios and unit testing using JUnit Testing Framework.

Environment: J2EE, Oracle, ClearCase, JDBC, UNIX, Junit, Eclipse, Struts, XML, XSLT, XPATH, XHTML, CSS, HTTP

We'd love your feedback!