We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Burlington, MA

PROFESSIONAL SUMMARY:

  • Overall 8+ years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Big data Hadoop technologies.
  • Having 4+ years of experience in Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, NiFi, Sqoop, Spark.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN.
  • Good exposure on usage of NoSQL database column - oriented HBase, Cassandra.
  • Extensive experienced in working with structured data using Hive QL , join operations, writing custom UDF’s and experienced in optimizing Hive Queries .
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experienced in job workflow scheduling and monitoring tools like Oozie, Fuse.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks etc.) to fully implement and leverage new Hadoop features.
  • Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
  • Experience in working with Spark tools like RDD transformations and spark SQL.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
  • Experience in working with web development technologies such as HTML, CSS, JavaScript and JQuery.
  • Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, spring, Hibernate, JSON, XML, REST, SOAP Web services, and Eclipse Link.
  • Experienced in working with different scripting technologies like Python, UNIX shell scripts.
  • Experience on Source control repositories like SVN, CVS and GIT.
  • Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
  • Skilled at build/deploy multi module applications using Maven, Ant and integrated with CI servers like Jenkins.
  • Expertise in implementing Map Reduce jobs with both Java API and Apache Pig.
  • Experience on working with SQL and SQL Performance Tuning.
  • Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies. 
  • Developing and Maintenance the Web Applications using the Web Server Tomcat.
  • Hands on experience in developing web application using Spring Framework web module and integration with Struts MVC framework.
  • Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL & Sybase databases.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, Map Reduce, Apache NiFi, Hive, Pig, Sqoop, Oozie, Spark, Zookeeper, Impala, YARN, RabbitMQ Hadoop Distributions: Cloudera, HortonWorks.

NoSQL Databases: HBase, Cassandra

Databases: Oracle, Vertica, MY SQL, Teradata, Postgres SQL

Languages: C , JAVA, Sql, Hql, Python, Scala, Shell Scripting.

Web Technologies: HTML, XML, CSS, JSON, NodeJS And JavaScript.

Web Servers: Apache Tomcat.

Version Control: SVN, CVS, GIT.

Operating Systems: Linux, UNIX, Mac OS-X, CentOS, Windows 8, Windows 7 And Windows Server 2008/2003.

PROFESSIONAL EXPERIENCE:

Confidential, Burlington, MA

Data Engineer

Responsibilities:

  • Involved in the end-to-end process from loading the subscriber data into tables, and performing sequence of steps to merge with the other data sources in Cloudera.
  • Worked on ETL process using Hive with different engines like MR, tez and spark.
  • Consolidated the small files for large set of data using spark Scala to create table on the data.
  • Worked on loading large tables around 400GB in different formats like sequence and parquet format’s.
  • Written scripts to continuously dump and reload data from Oracle to HDFS using Sqoop and Vice-versa.
  • Implemented HQL Scripts in creating Hive tables, loading, analyzing, merging, binning, backfilling, cleansing using hive.
  • Ingested the final table data from Cloudera into Vertica server and then loading into Vertica tables.
  • Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
  • Involved in deployment process using Jenkins and also worked closely in fixed the version issues.
  • Responsible for monthly data refresh for all the MVPD’s to generate advertising campaign data.
  • Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE .
  • Worked on Generating Config’s for the new InfoBase file which has more than 1000 columns.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL , Data Frame , pair RDD's , Spark YARN .
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
  • Worked on creating Source Repositories and DataMart’s in the tool called Data Transformation.
  • Documented the process and Mentored analyst and test team for writing Hive and Impala Queries.
  • Worked on Binning and Backfilling the huge amount of data according to business logic.
  • Worked on Documenting the DT process for difference sources.
  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.

Environment: Hadoop, HDFS, Hive, Spark, Shell Scripting, Python, Jenkins, Groovy, Tomcat, PostgreSQL, Stash, Cloudera, Oracle, Vertica, Sqoop.

Confidential, Cincinnati, OH

Big Data/Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed Data Ingestion Service from remote SaaS server into HDFS.
  • Involved in migration of HDFS files from old Dunnhumby to new domain Confidential cluster.
  • Developed flow xml files using Apache NiFi, a workflow automation tool to ingest data into HDFS.
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
  • Used RabbitMQ as a messaging service to get down streams notified about the data ingested.
  • Involved in End-to-End implementation of ETL logic .
  • Pre-Processed the data ingested using Apache Pig to eliminate the bad records as per business requirements with the help of filter functions, User Defined Functions.
  • Worked on Apache Spark component in Scala to do transformations like normalization and standardization and convert raw data to Apache Parquet.
  • Experienced in migrating ETL transformations using Pig Latin Scripts , transformations, join operations.
  • Worked on creation of Hive tables for execution of HQL queries as per client’s requirement.
  • Optimized Apache Hive queries using partitioning using asofmonth for performance improvement.
  • Move data from different sources in to Hadoop and define detailed technical processes for data acquisition.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Worked on Oozie workflows for automation of Hive, Pig, Spark jobs and developed.
  • Configured Team city for Continuous Integration/Continuous Deployment (CI/CD) of code into Edge Node.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Responsible for writing validation script using Shell scripting .  
  • Used GitHub for version control repository.
  • Worked in XML transformation using XSLT .
  • Participate and contribute to estimations and Project Planning with team and Architecture.

Environment: Hadoop, HDFS, Apache NiFi, RabbitMQ, Oozie, Pig, Spark, Scala, Hive, Shell Scripting, LINUX, Shell Scripting, Cloudera 5.4.

Confidential, Columbus, OH

Big Data/Hadoop Developer

Responsibilities: 

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Created Hive  Tables, loaded transactional data from Teradata using Sqoop .
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables. 
  • Experience working with very large datasets of data distributed across large data clusters
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop .
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented Hive Generic UDF’s to incorporate business logic into Hive Queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Creating Hive tables and working on them using Hive QL
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Created and maintained Technical documentation for executing Hive queries, Pig Scripts, Sqoop jobs.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Oozie, Maven, Shell Scripting, Cloudera.

Confidential, Reston, VA

Java/Hadoop Developer

Responsibilities: 

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades of Hortonworks as required.
  • Developed multiple Map Reduce jobs for data cleaning and preprocessing
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Developed the application using Struts Framework that leverages classical Model View Layer ( MVC ) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
  • Participated in requirement gathering and converting the requirements into technical specifications
  • Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JBoss and Web Services.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Generated the datasets and loaded to HADOOP Ecosystem.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Hortonworks, J2EE, SQL, Web Sphere, HTML, XML, ANT, Oracle, JavaScript, Hadoop, HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Flume, and LINUX.

Confidential

J2EE Developer

Responsibilities:

  • Actively participated in requirements gathering, analysis, design, and testing phases.
  • Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
  • Strong experience in SQL and knowledge of Oracle and MS SQL Server database.
  • Experience in ETL software development.
  • Create logical and physical data models and generate the SQL for the design using Erwin.
  • Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
  • Developed SQL queries and stored procedures.
  • Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
  • Used JUnit Framework for the unit testing of all the java classes.
  • Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.
  • Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, SQL, Java, SQL, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, WebLogic, XML, Junit, Web Sphere, My Eclipse

We'd love your feedback!