Data Engineer Resume Burlington, MA - Hire IT People

PROFESSIONAL SUMMARY:

Overall 8+ years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Big data Hadoop technologies.
Having 4+ years of experience in Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, NiFi, Sqoop, Spark.
Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN.
Good exposure on usage of NoSQL database column - oriented HBase, Cassandra.
Extensive experienced in working with structured data using Hive QL , join operations, writing custom UDF’s and experienced in optimizing Hive Queries .
Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
Experienced in job workflow scheduling and monitoring tools like Oozie, Fuse.
Experience using various Hadoop Distributions (Cloudera, Hortonworks etc.) to fully implement and leverage new Hadoop features.
Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
Experience in working with Spark tools like RDD transformations and spark SQL.
Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
Experience in working with web development technologies such as HTML, CSS, JavaScript and JQuery.
Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, spring, Hibernate, JSON, XML, REST, SOAP Web services, and Eclipse Link.
Experienced in working with different scripting technologies like Python, UNIX shell scripts.
Experience on Source control repositories like SVN, CVS and GIT.
Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
Skilled at build/deploy multi module applications using Maven, Ant and integrated with CI servers like Jenkins.
Expertise in implementing Map Reduce jobs with both Java API and Apache Pig.
Experience on working with SQL and SQL Performance Tuning.
Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies.
Developing and Maintenance the Web Applications using the Web Server Tomcat.
Hands on experience in developing web application using Spring Framework web module and integration with Struts MVC framework.
Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL & Sybase databases.
Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
Adequate knowledge and working experience in Agile & Waterfall methodologies.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, Map Reduce, Apache NiFi, Hive, Pig, Sqoop, Oozie, Spark, Zookeeper, Impala, YARN, RabbitMQ Hadoop Distributions: Cloudera, HortonWorks.

NoSQL Databases: HBase, Cassandra

Databases: Oracle, Vertica, MY SQL, Teradata, Postgres SQL

Languages: C , JAVA, Sql, Hql, Python, Scala, Shell Scripting.

Web Technologies: HTML, XML, CSS, JSON, NodeJS And JavaScript.

Web Servers: Apache Tomcat.

Version Control: SVN, CVS, GIT.

Operating Systems: Linux, UNIX, Mac OS-X, CentOS, Windows 8, Windows 7 And Windows Server 2008/2003.

PROFESSIONAL EXPERIENCE:

Confidential, Burlington, MA

Data Engineer

Responsibilities:

Involved in the end-to-end process from loading the subscriber data into tables, and performing sequence of steps to merge with the other data sources in Cloudera.
Worked on ETL process using Hive with different engines like MR, tez and spark.
Consolidated the small files for large set of data using spark Scala to create table on the data.
Worked on loading large tables around 400GB in different formats like sequence and parquet format’s.
Written scripts to continuously dump and reload data from Oracle to HDFS using Sqoop and Vice-versa.
Implemented HQL Scripts in creating Hive tables, loading, analyzing, merging, binning, backfilling, cleansing using hive.
Ingested the final table data from Cloudera into Vertica server and then loading into Vertica tables.
Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
Involved in deployment process using Jenkins and also worked closely in fixed the version issues.
Responsible for monthly data refresh for all the MVPD’s to generate advertising campaign data.
Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE .
Worked on Generating Config’s for the new InfoBase file which has more than 1000 columns.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL , Data Frame , pair RDD's , Spark YARN .
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
Worked on creating Source Repositories and DataMart’s in the tool called Data Transformation.
Documented the process and Mentored analyst and test team for writing Hive and Impala Queries.
Worked on Binning and Backfilling the huge amount of data according to business logic.
Worked on Documenting the DT process for difference sources.
Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.

Environment: Hadoop, HDFS, Hive, Spark, Shell Scripting, Python, Jenkins, Groovy, Tomcat, PostgreSQL, Stash, Cloudera, Oracle, Vertica, Sqoop.

Confidential, Cincinnati, OH

Big Data/Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed Data Ingestion Service from remote SaaS server into HDFS.
Involved in migration of HDFS files from old Dunnhumby to new domain Confidential cluster.
Developed flow xml files using Apache NiFi, a workflow automation tool to ingest data into HDFS.
Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
Used RabbitMQ as a messaging service to get down streams notified about the data ingested.
Involved in End-to-End implementation of ETL logic .
Pre-Processed the data ingested using Apache Pig to eliminate the bad records as per business requirements with the help of filter functions, User Defined Functions.
Worked on Apache Spark component in Scala to do transformations like normalization and standardization and convert raw data to Apache Parquet.
Experienced in migrating ETL transformations using Pig Latin Scripts , transformations, join operations.
Worked on creation of Hive tables for execution of HQL queries as per client’s requirement.
Optimized Apache Hive queries using partitioning using asofmonth for performance improvement.
Move data from different sources in to Hadoop and define detailed technical processes for data acquisition.
Developed Simple to complex Map Reduce Jobs using Hive and Pig.
Worked on Oozie workflows for automation of Hive, Pig, Spark jobs and developed.
Configured Team city for Continuous Integration/Continuous Deployment (CI/CD) of code into Edge Node.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Responsible for writing validation script using Shell scripting .
Used GitHub for version control repository.
Worked in XML transformation using XSLT .
Participate and contribute to estimations and Project Planning with team and Architecture.

Environment: Hadoop, HDFS, Apache NiFi, RabbitMQ, Oozie, Pig, Spark, Scala, Hive, Shell Scripting, LINUX, Shell Scripting, Cloudera 5.4.

Confidential, Columbus, OH

Big Data/Hadoop Developer

Responsibilities:

Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
Created Hive Tables, loaded transactional data from Teradata using Sqoop .
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Experience working with very large datasets of data distributed across large data clusters
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop .
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Implemented Hive Generic UDF’s to incorporate business logic into Hive Queries.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Creating Hive tables and working on them using Hive QL.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Created and maintained Technical documentation for executing Hive queries, Pig Scripts, Sqoop jobs.
Involved in Agile methodologies, daily scrum meetings, spring planning.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Oozie, Maven, Shell Scripting, Cloudera.

Confidential, Reston, VA

Java/Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
Involved in creating Hive tables, and loading and analyzing data using hive queries.
Developed Simple to complex Map Reduce Jobs using Hive and Pig.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades of Hortonworks as required.
Developed multiple Map Reduce jobs for data cleaning and preprocessing
Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
Developed the application using Struts Framework that leverages classical Model View Layer ( MVC ) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
Participated in requirement gathering and converting the requirements into technical specifications
Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JBoss and Web Services.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Generated the datasets and loaded to HADOOP Ecosystem.
Assisted in exporting analyzed data to relational databases using Sqoop.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Hortonworks, J2EE, SQL, Web Sphere, HTML, XML, ANT, Oracle, JavaScript, Hadoop, HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Flume, and LINUX.

Confidential

J2EE Developer

Responsibilities:

Actively participated in requirements gathering, analysis, design, and testing phases.
Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
Strong experience in SQL and knowledge of Oracle and MS SQL Server database.
Experience in ETL software development.
Create logical and physical data models and generate the SQL for the design using Erwin.
Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
Developed SQL queries and stored procedures.
Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
Used JUnit Framework for the unit testing of all the java classes.
Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.
Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, SQL, Java, SQL, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, WebLogic, XML, Junit, Web Sphere, My Eclipse

We provide IT Staff Augmentation Services!

Data Engineer Resume

Burlington, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship