We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Denver, CO

SUMMARY

  • Over 8+ years of experience in Development, Design, Integration and Presentation with Java along with Extensive years of Big Data /Hadoop experience in Hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Abase, SPARK, Kafka, Python and AWS.
  • Experience implementing big data projects on Clouduera 5.6,5.8,5.13 Horton Works 2.7 and AWS 5.6, 5.20, 5.29 versions.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Hands - on experience in designing and implementing solutions using Apache Hadoop 2.4.0, HDFS 2.7, MapReduce2, HBase 1.1, Hive 1.2, Oozie 4.2.0, Tez 0.7.0,Yarn 2.7.0,Sqoop 1.4.6,MongoDB.
  • Setting up and integrating Hadoop ecosystem tools - HBase, Hive, Pig, Sqoop etc.
  • Expertise in Big Data architecture like Hadoop (Azure, Horton works, Cloudera) distributed system, MongoDB, NoSQL.
  • 3+ Years of Looker Experience in both development and administration
  • Hands on experience loading the data into Spark RDD and performing in-memory data computation
  • Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experience in using Microsoft Azure, ADF, ADLS, Azure Blob, MlS.
  • Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN and NoSQL).
  • Experience in the design and construction of data warehouses, ideally in Snowflake, Redshift.
  • Implementing data solutions using Snowflake, AWS Oracle and SQL Server databases.
  • Experience in analyzing data using Hive, Pig Latin and custom MR programs in Java.
  • Hands on experience in writing Spark SQL scripting.
  • Experience on working with APEX development and oracle framework.
  • Sound knowledge in programming Spark using Scala.
  • Good understanding in processing of real-time data using Spark.
  • Experience in Looker custom table calculations like offset etc.
  • Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
  • Publish reports to business users with the help of Alteryx gallery and server
  • Experienced in Worked on No SQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modelling.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Hands on experience on building Data lakes using trending data lake tools Cask CDAP and Zaloni.
  • Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions (on an ad hoc or scheduled basis).
  • Extensive knowledge in programming with Resilient Distributed Data sets (RDDs).
  • Configured Hadoop clusters in Open Stack and Amazon Web Services (AWS)
  • Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
  • Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
  • Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
  • Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time dashboards.
  • Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Horton works Ambari.
  • Gaining optimum performance with data compression, region splits and by manually managing compaction in HBase.
  • Experience in cloud and hybrid-cloud computing with Google Cloud and become a trusted advisor to influential decision-makers.
  • Work with BiqQuery to handle complex queries
  • Experience on Google cloud with deep understanding, design, and development experience with the Google Cloud Platform(GCP)
  • Upgrading from HDP 2.1 to HPD 2.2 and then to HDP 2.3.
  • Working experience in Map Reduce programming model and Hadoop Distributed File System.
  • Hands on experience on Unix/Linux environments, which included software installations/ upgrades, shell scripting for job automation and other maintenance activities.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
  • Thorough knowledge and experience in SQL and PL/SQL concepts.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.
  • Experienced with setup,configuration and maintain ELK stack( Elasticsearch,Logstash and Kibana)

TECHNICAL SKILLS

Operating System: Linux, UNIX, IOS, TinyOS, Sun Solaris, HP-UX, Windows 8, Windows 7, UNIX, Linux, Centos, Ubuntu.

Hadoop/Big Data: Apache Spark, HDFS, MapReduce, MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, Scala, Flume, Apache ignite, Avro, AWS, Google Cloud Platform( GCP )

Languages: Scala. Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, Scala HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.

Data Warehousing& BI: Informatica Power Center 9x/8x/7x, Power Exchange, IDQ, ambari view, consumption framework

ETL Tools: IBM Info sphere Data stage 11.5, MSBI (SSIS), Sqoop, TDCH, Manual, etc

Database: Oracle 11g, AWS Redshift, AWS Athena, IBM Netezza, HBase, Apache Phoenix, SQL Server,Oracle, and MYSQL, HBase, Mongo DB, Cassandra, Looker.

Debugging tools: Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy, JIRA.

Version Controller: Tortise HG, Microsoft TFS, SVN, GIT, CVS, Tpump, Mload, Fast Export.

GUI Editors: IntelliJ Community Edition, IntelliJ Data grip, dB Visualizer, SQL SQL, DBeaver

PROFESSIONAL EXPERIENCE

Confidential, Denver, CO

Data Engineer

RESPONSIBILITIES:

  • Migrated on-premise ETL pipelines running on IBM Netezza to AWS, developed and automated process to migrate data to AWS S3, run ETL using spark on EC2 and delivered data on S3, AWS Athena and AWS Redshift.
  • Involved in requirements gathering and building data lake on top of HDFS and Worked on Go-cd CI/CD tool to deploy the application and have experience within framework for big data testing.
  • Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
  • Worked on Looker from 3.x to 6.x versions.
  • Converted reports from Qliksense,Tableau to Looker
  • Developed pyspark programs and created the data frames and worked on transformations.
  • Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
  • Experience in integrating DataStage with BPM tools
  • Setting up the data actions in Looker Table calculations in Looker.
  • Designing and building scalable DataStage solutions.
  • Experience in developing taking requirements on raw data and produce the consumable data based on the requirement using DataStage, pure standard ETL development experience
  • Migrate the data to the target state and build sub-Data Pipeline in Google Cloud Platform.( GCP)
  • Design, and implement large scale distributed data processing systems, data warehouses, data pipelines, and flows.
  • Used Horton works distribution for Hadoop ecosystem.
  • Created Sqoop jobs for importing the data from Relational Database systems into HDFS and also used dump the result into the data bases using Sqoop.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Written python scripts to analyse the data of the customer.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • POC and implementation done to interface with storage system called Scality S3 which is AWS S3 implementation.
  • Maintainance and development of Scality SDK for end users
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating Map Reduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Build the infrastructure required for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources like Salesforce, SQL Server, Oracle & SAP using Azure, Spark, Python, Hive, Kafka and other Bigdata technologies.
  • Use Data frames for data transformations using RDD.
  • Designed and Developed Spark workflows using Scala for data pull from cloud-based systems and applying transformations on it.
  • Using Spark streaming consumes topics from distributed messaging source Event hub and periodically pushes batch of data to Spark for real time processing
  • Tuned Cassandra and MySQL for optimizing the data.
  • Implemented monitoring and established best practices around usage of elastic search
  • Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
  • Hands-on experience with Horton works tools like Tea and Amari.
  • Worked on Apache Knife as ETL tool for batch processing and real time processing.
  • Fetch and generate monthly reports. Visualization of those reports using Tableau.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Experience in Data modelling using Cassandra.
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experience in Designing and developing ETL workflows and datasets in Alteryx to be used by the BI Reporting tool.
  • Publish reports to business users with the help of Alteryx gallery and server.
  • Used Data tax Spark-Cassandra connector to load data into Cassandra and used CQL to analyse data from Cassandra tables for quick searching, sorting and grouping.
  • Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau.
  • Deployed data from various sources into HDFS and building reports using Tableau.
  • Extensively in creating Map-Reduce jobs to power data for search and aggregation.
  • Managed Hadoop jobs by DAG using Oozie workflow scheduler.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
  • Migrate data from Elasticsearch 1.4.3 cluster to Elasticsearch 5.6.4 using Logstash, Kafka for all environments.

ENVIRONMENT: Hadoop, Hive, Map Reduce, Sqoop, Spark, Eclipse, Maven, Java, agile methodologies, AWS, Tableau, Pig, Elastic search, Strom, Cassandra, Impala, Oozie, Java, Python, Shell Scripting, MapReduce, Java Collection, MySQL, apache AVRO, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Horton works distribution of Hadoop 2.3, YARN, Amari, Elasticsearch

Confidential, Framingham, MA

Data Engineer

RESPONSIBILITIES:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on CentOS
  • Analyze and cleanse raw data using HiveQL
  • Experience in data transformations using Map-Reduce, HIVE for different file formats.
  • Involved in converting Hive/SQL queries into transformations using Python
  • Performed complex joins on tables in hive with various optimization techniques
  • Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
  • Worked extensively with HIVE DDLS and Hive Query language(HQLs)
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Understand and manage Hadoop Log Files.
  • Manage Hadoop infrastructure with Cloudera Manager.
  • Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
  • Build Integration between applications primarily Salesforce.
  • Extensive work in Informatica Cloud.
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows, Mapping configurations, Real Time apps like process designer and process developer.
  • Work extensively with flat files. Loading them into on-premise applications and retrieve data from applications to files.
  • Develop Informatica cloud real time processes (ICRT).
  • Work with WSDL, SOAP UI for APIs
  • Write SOQL queries, create test data in salesforce for informatica cloud mappings unit testing.
  • Prepare TDDs, Test Case documents after each process has been developed.
  • Identify and validate data between source and target applications.
  • Verify data consistency between systems.
  • Collaborate with a team to develop, deploy, maintain, and update cloud solutions Design solution architecture on the Google Cloud Platform( GCP)
  • Experience building data integrations with Python, Python API data extraction, Airflow, container applications, Docker, Kubernetes, BigQuery, GCP
  • Implemented a script to transmit suspiring information from Oracle to HBase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Applied design patterns and OO design conceptsto improve the existing Java/J2EE based code base.
  • Developed JAX-WS web services.
  • Handling Type 2 and type 1 slowly changing dimensions.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying
  • Involved in the design, implementation and maintenance of Data warehouses
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Implemented custom interceptors for flume to filter data as per requirement.
  • Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
  • Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
  • Configured daily workflow for extraction, processing and analysis of data using Oozie Scheduler.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoopcluster.
  • Wrote Pig Latin scripts for running advanced analytics on the data collected.
  • Understanding data ingestion pipelines and data lineage traces using Google Cloud Platform(GCP) products such as Big Query, Dataflow, Cloud Data Fusion, and Data Prep. Etc
  • Experience with Google Cloud Platform(GCP) and Google Big Query, and strong SQL skills
  • Experience with Google Cloud Platform(GCP) (especially BigQuery) Experience developing scripts to transfer data from external data sources to GBQ.
  • Advanced knowledge of the Google Cloud Platform(GCP) ecosystem around Big Query.
  • Experience building data integrations with Python, Python API data extraction, Airflow, container applications, Docker, Kubernetes, BigQuery, GCP

ENVIRONMENT: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Jenkins, windows AD, windows KDC, Horton works distribution of Hadoop 2.3, YARN, Ambari, Hadoop 2.6.0 YARN, Map R, Red hat Linux, Cent OS, Java 1.6, Hive 0.13, Pig, MySQL, Hbase Spark, Oozie, HDFS, Storm, MongoDB,CDH3, Centos, Sqoop, Oozie, UNIX, T-SQL Hortonworks.

Confidential, Overland Park, KS

Data Engineer

RESPONSIBILITIES:

  • Suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin.
  • Experience in writingSpark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions usingSparkSQLand performed interactive querying.
  • Expertise in deploying Snowflake features such as data sharing, events and lake-house patterns
  • Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python
  • Worked oninstalling cluster, commissioning & decommissioning of Datanode, Name nodehigh availability, capacity planning, and slots configuration.
  • Experience building data integrations with Python, Python API data extraction, Airflow, container applications, Docker, Kubernetes, BigQuery, GCP
  • Responsible for managing data coming from different sources.
  • Imported and exported data into HDFS using Flume.
  • Experienced in analyzing data with Hive and Pig.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
  • Experienced in managing and reviewing Hadoop log files.
  • Helped with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
  • Analyzed data with Hive, Pig and Hadoop Streaming.
  • Involved in transforming therelational databaseto legacy labels to HDFS andHBASEtables usingSqoopand vice versa.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop.
  • Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
  • Used Flume to collect, aggregate and push log data from different log servers.

ENVIRONMENT: Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Shell Scripting Hadoop, Horton works, Linux, HDFS, Cloudera Hadoop, Linux, HDFS, Map reduce, Oracle, SQL Server, Eclipse, Java and Oozie scheduler.

Confidential

Data Engineer

RESPONSIBILITIES:

  • suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin
  • Experience in writingSpark applications for Data validation, cleansing, transformations and custom aggregations.
  • Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop
  • Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
  • Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
  • Extensive work in Informatica Powercenter.
  • Acted as SME for Data Warehouse related processes.
  • Performed Data analysis for building Reporting Data Mart.
  • Worked with Reporting developers to oversee the implementation of report/universe designs.
  • Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
  • Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks
  • Worked with PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
  • Worked with deployments from Dev to UAT, and then to Prod.
  • Worked with Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, WebServices applications
  • Experienced in managing and reviewing Hadoop log files.
  • Practical work experience with Hadoop Ecosystem (i.e. Hadoop, Hive, Pig, Sqoop etc.)
  • Experience with UNIX and/or Linux.
  • Conduct Trainings on Hadoop Map Reduce, Pig and Hive.
  • Demonstrates up-to-date expertise in Hadoop and applies this to the development, execution, and improvement.
  • Worked on migration project which included migrating webmethods code to Informatica cloud.
  • Implemented Proof of concepts for SOAP & REST APIs
  • Built web services mappings and expose them as SOAP wsdl
  • Worked with Reporting developers to oversee the implementation of reports/dashboard designs in Tableau.
  • Assisted report developers with writing required logic and achieve desired goals.
  • Met End Users for gathering and analyzing the requirements.
  • Worked with Business users to identify root causes for any data gaps and developing corrective actions accordingly.
  • Created Ad hoc Oracle data reports for presenting and discussing the data issues with Business.
  • Performed gap analysis after reviewing requirements.
  • Identified data issues within DWH dimension and fact tables like missing keys, joins, etc.
  • Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
  • Coordinating and providing technical details to reporting developers
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions usingSparkSQLand performed interactive querying.
  • Imported and exported data into HDFS using Flume.
  • Experienced in analyzing data with Hive and Pig.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in transforming therelational databaseto legacy labels to HDFSandHBASEtables usingSqoopand vice versa.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Used Flume to collect, aggregate and push log data from different log servers

ENVIRONMENT: Java, Oracle, HTML, XML, SQL, J2EE, JUnit, JDBC, JSP, Tomcat, SQL Server, MongoDB, JavaScript, Hibernate, MVC, JavaScript, CSS, Maven, Java 1.6, XML, Junit, SQL, PL-SQL, Eclipse, Web Sphere.

Confidential

Java/ J2EE Developer

RESPONSIBILITIES:

  • Involved in design of JSP’s and Servlets for navigation among the modules.
  • Designed cascading style sheets andXMLpart of Order entry Module & Product Search Module and did client side validations with java script.
  • Developed client customized interfaces for various clients using CSS and JavaScript.
  • Designed and implemented the User interface using HTML, CSS, JavaScript and SQL Server.
  • Developed Interfaces using JSP based on the Users, Roles, and Permissions. Screen options were displayed on User permissions.
  • This was coded using Custom Tags in JSP using Tag Libraries.
  • Created web services using Advanced J2EE technologies to communicate with external systems.
  • Involved in the UI development, including layout and front-end coding per the requirements of the client by using JavaScript and Ext JS.
  • Used Hibernate along with Spring Framework to integrate with Oracle database.
  • Built complex SQL queries and sca scripts for data extraction and analysis to define the application requirements.
  • Developed UI usingHTML,JavaScript, JSP, and developed Business Logic and Interfacing components using Business Objects,XML andJDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from databases.
  • Performing the code review for peers and maintaining the code repositories using GIT.
  • Enhanced the mechanism of logging and tracing with Log4j.
  • Web services client generation using WSDL file.
  • Involved in development of the presentation layer using STRUTS and custom tag libraries.
  • Performing integration testing, supporting the project, tracking the progress with help of JIRA.
  • Acted as the first point of contact for the Business queries during development and testing phase.
  • Working closely with clients and QA team to resolve critical issues/bugs.

ENVIRONMENT: JSP, Servlets, Struts, Hibernate, HTML, CSS, JavaScript, JSON, REST, JUnit, XML, SASS, DOM, Web Logic (Oracle App server), Web Services, Eclipse, Agile.

We'd love your feedback!