Data Engineer Resume
Burlington, MA
PROFESSIONAL SUMMARY:
- Overall 8+ years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Big data Hadoop technologies.
- Having 4+ years of experience in Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, NiFi, Sqoop, Spark.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN.
- Good exposure on usage of NoSQL database column - oriented HBase, Cassandra.
- Extensive experienced in working with structured data using Hive QL , join operations, writing custom UDF’s and experienced in optimizing Hive Queries .
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experienced in job workflow scheduling and monitoring tools like Oozie, Fuse.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks etc.) to fully implement and leverage new Hadoop features.
- Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
- Experience in working with Spark tools like RDD transformations and spark SQL.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
- Experience in working with web development technologies such as HTML, CSS, JavaScript and JQuery.
- Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, spring, Hibernate, JSON, XML, REST, SOAP Web services, and Eclipse Link.
- Experienced in working with different scripting technologies like Python, UNIX shell scripts.
- Experience on Source control repositories like SVN, CVS and GIT.
- Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
- Skilled at build/deploy multi module applications using Maven, Ant and integrated with CI servers like Jenkins.
- Expertise in implementing Map Reduce jobs with both Java API and Apache Pig.
- Experience on working with SQL and SQL Performance Tuning.
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies.
- Developing and Maintenance the Web Applications using the Web Server Tomcat.
- Hands on experience in developing web application using Spring Framework web module and integration with Struts MVC framework.
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL & Sybase databases.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
TECHNICAL SKILLS:
Big Data Ecosystems: HDFS, Map Reduce, Apache NiFi, Hive, Pig, Sqoop, Oozie, Spark, Zookeeper, Impala, YARN, RabbitMQ Hadoop Distributions: Cloudera, HortonWorks.
NoSQL Databases: HBase, Cassandra
Databases: Oracle, Vertica, MY SQL, Teradata, Postgres SQL
Languages: C , JAVA, Sql, Hql, Python, Scala, Shell Scripting.
Web Technologies: HTML, XML, CSS, JSON, NodeJS And JavaScript.
Web Servers: Apache Tomcat.
Version Control: SVN, CVS, GIT.
Operating Systems: Linux, UNIX, Mac OS-X, CentOS, Windows 8, Windows 7 And Windows Server 2008/2003.
PROFESSIONAL EXPERIENCE:
Confidential, Burlington, MA
Data Engineer
Responsibilities:
- Involved in the end-to-end process from loading the subscriber data into tables, and performing sequence of steps to merge with the other data sources in Cloudera.
- Worked on ETL process using Hive with different engines like MR, tez and spark.
- Consolidated the small files for large set of data using spark Scala to create table on the data.
- Worked on loading large tables around 400GB in different formats like sequence and parquet format’s.
- Written scripts to continuously dump and reload data from Oracle to HDFS using Sqoop and Vice-versa.
- Implemented HQL Scripts in creating Hive tables, loading, analyzing, merging, binning, backfilling, cleansing using hive.
- Ingested the final table data from Cloudera into Vertica server and then loading into Vertica tables.
- Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
- Involved in deployment process using Jenkins and also worked closely in fixed the version issues.
- Responsible for monthly data refresh for all the MVPD’s to generate advertising campaign data.
- Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE .
- Worked on Generating Config’s for the new InfoBase file which has more than 1000 columns.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL , Data Frame , pair RDD's , Spark YARN .
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
- Worked on creating Source Repositories and DataMart’s in the tool called Data Transformation.
- Documented the process and Mentored analyst and test team for writing Hive and Impala Queries.
- Worked on Binning and Backfilling the huge amount of data according to business logic.
- Worked on Documenting the DT process for difference sources.
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Environment: Hadoop, HDFS, Hive, Spark, Shell Scripting, Python, Jenkins, Groovy, Tomcat, PostgreSQL, Stash, Cloudera, Oracle, Vertica, Sqoop.
Confidential, Cincinnati, OH
Big Data/Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed Data Ingestion Service from remote SaaS server into HDFS.
- Involved in migration of HDFS files from old Dunnhumby to new domain Confidential cluster.
- Developed flow xml files using Apache NiFi, a workflow automation tool to ingest data into HDFS.
- Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
- Used RabbitMQ as a messaging service to get down streams notified about the data ingested.
- Involved in End-to-End implementation of ETL logic .
- Pre-Processed the data ingested using Apache Pig to eliminate the bad records as per business requirements with the help of filter functions, User Defined Functions.
- Worked on Apache Spark component in Scala to do transformations like normalization and standardization and convert raw data to Apache Parquet.
- Experienced in migrating ETL transformations using Pig Latin Scripts , transformations, join operations.
- Worked on creation of Hive tables for execution of HQL queries as per client’s requirement.
- Optimized Apache Hive queries using partitioning using asofmonth for performance improvement.
- Move data from different sources in to Hadoop and define detailed technical processes for data acquisition.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Worked on Oozie workflows for automation of Hive, Pig, Spark jobs and developed.
- Configured Team city for Continuous Integration/Continuous Deployment (CI/CD) of code into Edge Node.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Responsible for writing validation script using Shell scripting .
- Used GitHub for version control repository.
- Worked in XML transformation using XSLT .
- Participate and contribute to estimations and Project Planning with team and Architecture.
Environment: Hadoop, HDFS, Apache NiFi, RabbitMQ, Oozie, Pig, Spark, Scala, Hive, Shell Scripting, LINUX, Shell Scripting, Cloudera 5.4.
Confidential, Columbus, OH
Big Data/Hadoop Developer
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Created Hive Tables, loaded transactional data from Teradata using Sqoop .
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Experience working with very large datasets of data distributed across large data clusters
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop .
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented Hive Generic UDF’s to incorporate business logic into Hive Queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Creating Hive tables and working on them using Hive QL.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Created and maintained Technical documentation for executing Hive queries, Pig Scripts, Sqoop jobs.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Oozie, Maven, Shell Scripting, Cloudera.
Confidential, Reston, VA
Java/Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of data.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades of Hortonworks as required.
- Developed multiple Map Reduce jobs for data cleaning and preprocessing
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Developed the application using Struts Framework that leverages classical Model View Layer ( MVC ) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JBoss and Web Services.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Generated the datasets and loaded to HADOOP Ecosystem.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.
Environment: Hortonworks, J2EE, SQL, Web Sphere, HTML, XML, ANT, Oracle, JavaScript, Hadoop, HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Flume, and LINUX.
Confidential
J2EE Developer
Responsibilities:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
- Strong experience in SQL and knowledge of Oracle and MS SQL Server database.
- Experience in ETL software development.
- Create logical and physical data models and generate the SQL for the design using Erwin.
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Developed SQL queries and stored procedures.
- Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
- Used JUnit Framework for the unit testing of all the java classes.
- Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.
- Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.
Environment: J2EE, JDBC, SQL, Java, SQL, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, WebLogic, XML, Junit, Web Sphere, My Eclipse