We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

Bethesda, MD

SUMMARY

  • Over 8+ years of professional IT experience in variety of industries, which includes hands on experience of Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
  • Experience in different Hadoop distributions like Cloudera(CDH), Hortonworks(HDP), Elastic MapReduce(EMR).
  • Hands on experience in developing predictive models by using machine learning.
  • Implemented various machine learning techniques like Random forest, k - means, logistic regression for predictions and pattern identification using Spark-MLib.
  • Involved in performing the Linear Regression using Scala API and Spark.
  • Responsible for building Hadoop clusters with Hortonworks/Cloudera Distribution and integrate with Pentaho Data Integration (PDI) server.
  • Extensively worked on various machine learning algorithms, used nltk a natural language processing(NLP) library in Python to build models.
  • Good Knowledge of Deep learning, Neural Networks, Convolutional Neural Networks (CNN).
  • Extensive experience in Spark/Scala, Pyspark, MapReduce(MRv1) and MapReduce MRv2(YARN).
  • Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
  • Used Kafka to load data into HDFS and move data into NoSQL databases.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience working on Spark SQL and Data frames for faster execution of Hive queries using Spark SQLContext.
  • Involved in converting MapReduce jobs into transformations and actions using Spark RDDs and Spark Data frames.
  • Experience in creating Pig and Hive UDFs in order to analyze the data efficiently.
  • Hands-on experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Talend/Informatica ecosystems.
  • Experience with Sequence files, AVRO, ORC, parquet file formats and gzip, snappy, bz2 compressions.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm - Kafka.
  • Integrated Apache Storm with Kafka t perform web analytics. Uploaded clickstream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Experience in importing and exporting data using Sqoop from HDFS to relational database systems and vice-versa.
  • Strong experience in working with Elastic MapReduce and setting up environments on Amazon AWS EC2 instances.
  • Hands on NoSQL database experience with HBase and Cassandra.
  • Installation of Solr and configuring Solr Indexing of near real-time data.
  • Experience using CQL to execute queries on data persisting in the Cassandra cluster.
  • Involved in processing of data using Apache Tez and storing it to Cassandra.
  • Extensively worked on MongoDB concepts like locking, transactions, indexes, sharding, replication and schema design.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed it.
  • Developed core search components using Apache Solr.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts
  • Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop, Mahout, Zookeeper
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in building, maintaining multiple Hadoop clusters (prod, dev etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
  • Loading data from different source databases and files into Hive using Talend tool.
  • Experience creating reports and building dashboards using Tableau.
  • Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
  • Experience in optimization of Mapreduce algorithm using combiners and partitioners to deliver the best results.
  • Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Followed Test driven development of Agile, Water Fall and RUP Methodology to produce high quality software.
  • Expertise in developing distributing business applications using EJB implementing Session beans for business logic, Entity beans for persistence logic and Message driven beans for asynchronous communication.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Involved in installing and configuration of distribution systems as a Hortonworks Distribution (HDP)and worked on full SDLC as an agile methodology.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedures PL/ SQL, Packages and Triggers in Oracle and SQL Server on Windows and UNIX.
  • Worked on different OS like UNIX/Linux, Windows NT, Windows XP, and Windows 2K.

TECHNICAL SKILLS

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Bethesda, MD

Sr. Data Engineer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on them.
  • Performed custom aggregate functions using Spark SQL and performed interactive querying.
  • Co-ordination with Hortonworks, development and the operations team on the platform level issues.
  • Extensively worked on creating combiners, partitioning, distributed cache to improve performance of MapReduce jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
  • Used Sqoop transfer data between databases and HDFS and used Kafka to stream the log data from servers.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
  • Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
  • Implemented read references in MongoDB replica set.
  • Used Apache Tez for processing data and storing it in MongoDB.
  • Familiar with MongoDB write concern to avoid loss of data during system failures.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Extensively performed CRUD operations like put, get, scan, delete, update etc., on HBase database.
  • Wrote Hive Generic UDF’s to perform business logic operations at table level.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig, Hive, Sqoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used Hive join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
  • Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked on various file formats and compression Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
  • Implemented test scripts to support test driven development and continuous integration.
  • Scheduling cron jobs for file system check using fsck and wrote shell scripts to generate alerts.
  • Data scrubbing and processing with Oozie.
  • Loading the analyzed Hive data into NOSQL databases like Hbase, MongoDB.
  • Provide Technical support for the Research in Information Technology program
  • Manage and upgrade Linux and OS X server systems.
  • Responsible for installation, configurations and management for Linux Systems

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, MD

Big Data Engineer

Responsibilities:

  • Plan, design and launch solution for building Hadoop cluster on cloud by using EMR and EC2 of AWS.
  • Converted Mapreduce jobs into transformations and actions using Spark RDDs and Spark Dataframes, Datasets.
  • Responsible for writing Apache Pig scripts and Hive queries for data quality analysis.
  • Used Flume to retrieve data from many sources into Hadoop Distributed File System(HDFS).
  • Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of unstructured data from logs into HDFS using Flume
  • Collected and aggregated large amounts of log data using Apache Flume and storing in into HDFS for future analysis.
  • Developed core search component using Apache Solr.
  • Installation of Solr and configuring Solr for Indexing of near real-time data.
  • Developed Spark-Cassandra connector to load data to and from Cassandra.
  • Worked with CQL to execute queries on data persisting in the Cassandra cluster.
  • Designed Spark applications in Scala and Python to interact with data stored in HDFS using SQLContext and access Hive tables using HiveContext.
  • Used Impala query engine to write queries to get faster results.
  • Defined job workflows as per dependencies in Oozie.
  • Developed the warehouse specific DataLake using Hive and Pig scripting and ETL Talend pipelines for populating the DataMarts for user/business consumption using Hive/Impala and Spark.
  • Experience in managing and reviewing Hadoop log files.
  • Migrated historical data from existing warehouses to Hadoop using Sqoop for scalable processing of the data and the eventual insights are sqooped back.
  • Worked on Talend to run ETL jobs on the data in HDFS.
  • Built services, deployed models, algorithms, performed model training and provided tools to make our infrastructure more accessible.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Responsible for Linux System Administration, DevOps, AWS Cloud platform and its features.
  • Implemented Elastic Search to decrease query times and increase search capabilities.
  • Extensively used S3 to store data and deployed EC2 instances using Elastic MapReduce(EMR) to perform analysis.
  • Configured Virtual Private Cloud(VPC) which includes various subnets for different teams to deploy their own clusters and increase or decrease the number of instances depending on the need.
  • Support Data Analysis projects using Elastic MapReduce on the Amazon Web Services(AWS) cloud
  • Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Started using Apache NiFi to copy the data from local file system to HDP.
  • Scheduled data loading from multiple sources into Redshift using Kinesis Stream.
  • Use copy and unload data to/from Redshift database between In-premises and AWS.
  • Designed Elastic Load Balancer(ELB) and launched in subnets to distribute network traffic to multiple instances.
  • Supporting Redshift Database using STL, SVL, STV, SVV system tables/views, unload into S3/ In-Premises, copy from PostgreSQL, schedule ELT from multiple sources using Kinesis Stream
  • Worked on several Amazon Web Services like EC2, ELB, VPC, S3, CloudFront, IAM, RDS, Route53, Cloudwatch, RedShift, SNS, SQS, SES, lambda to namely few.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, Cassandra, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, TX

Hadoop Developer

Responsibilities:

  • Involved in implementing Hadoop Cluster and data integration in developing large-scale system software.
  • Worked on analyzing Hadoop Distribution(HDP) and different Big Data analytic tools.
  • Worked on ORC File format, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Setup security using Kerberos and AD on Hortonworks cluster.
  • Developed various python scripts to find vulnerabilities with SQL queries by doing SQL injection, permission checks and performance analysis.
  • Worked extensively with Sqoop for importing data.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Extensively used Pig for data cleansing.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Created queries in Hive to process large sets of structured, semi-structured and unstructured data and store in Managed and external tables and also created partition tables.
  • Experience with Sequence files, AVRO, ORC, parquet file formats and gzip, snappy, bz2 compressions.
  • Develop pig scripts to convert the data from Text file to Avro format.
  • Performed upgrades and configuration changes. Commissioned/decommission modes as needed on the go.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Supported and Monitored Mapreduce programs running on cluster and provide production support.
  • Used Oozie for fetching out data on the periodic basis and in periodic timely fashion.
  • Managed hadoop operations with multi-node HDFS cluster using Cloudera Manager.
  • Involved in ETL transformation of OLTP data to the Data Warehouse implementing all transformations using SSIL and SQL commands.
  • Created SSIS packages to extract data from OLTP to OLAP systems and scheduled jobs to call the packages and stored procedures.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
  • Involved in processing the data in the hive tables using HQL high-performance, low-latency queries.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

  • Involved in requirements gathering and analysis from the existing system. Captured requirements using Use Cases and Sequence Diagrams.
  • Designed physical and logical data model and data flow diagrams.
  • Analyzed and modified existing code wherever required and responsible for gathering, documenting and maintaining business and system requirements and developing designs document.
  • Developed Enterprise Java Beans (Session Beans) to perform middleware services and interact with DAO layer to perform database operations like update, retrieve, insert and delete.
  • Implemented Ant and Maven build tools to build jar and war files and deployed war files to target servers.
  • Used Rally tool for the development of Agile-lifecycle management creating the stories, updating the tasks and reporting the bugs.
  • Involved in schema design and XML page implementation.
  • Developed Message Driven Bean components with WebSphere MQ Series for e-mailing and Data transfer between client and the providers.
  • Created business classes depending upon the requirements.
  • Involved in developing interface for WEB pages like user registration, login and registered access control for users depending on logins using HTML, CSS and JavaScript/AJAX.
  • Analyzed data using complex SQL queries, across various databases.
  • As part of development, I was involved in gathering requirements.
  • Performed GitHub/GitHub-Desktop bash and terminal commands to clone, fetch, merge and push the code and created pull requests for changes that are made.
  • Involved in database design writing DDL and DML scripts.
  • Created several Exception classes to catch the error for a bug free environment and logged the whole process using log4j, which gives the ability to pinpoint the errors.
  • Used DB2 Database to store the system data
  • Involved in creating database objects like views, tables, procedures
  • Extensively used the advanced features of PL/SQL like Records, Tables, Ref Cursors, Object types and Dynamic SQL
  • Developing, implementing and unit testing of the application environment.

Environment: Java, J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

  • Used both WebLogic portal 9.2 for Portal development and WebLogic 8.1 for Data services programming.
  • Involved in gathering requirements from business users
  • Experience in Design and Development of database systems using Relational Database Management Systems including Oracle MS SQL Server and MySQL.
  • Upgradation of WebLogic servers in development, testing and production environment and applying patch and service packs.
  • Worked on creating EJBs that implements business logic.
  • WebLogic Administration, Monitoring and Troubleshooting using Admin Console and JMX and monitoring server health and service packs.
  • Involved in designing and development of the e-commerce site using JSP, Servlet, EJBs, JavaScript and JDBC.
  • Worked with data migration team, providing the mapping between the source and target systems.
  • Validated all forms using struts validation framework and implemented Tiles framework in the presentation layer.
  • Developed the Web Interface using Struts, JavaScript, HTML and CSS.
  • Developed JSP pages with Struts and EJB for implementing different search pages for transaction of each module.
  • Identified and implemented the user actions (Struts Action classes) and forms(Struts Form classes) as a part of Struts framework.
  • Involved in the design and coding of the data capture templates, presentation and component templates.
  • Designed intermediate database tables as per technical specifications.
  • Created web front end using JSP pages integrating AJAX and JavaScript coding that provide a rich browser based user interface.
  • Implemented database using SQL Server
  • Involved in bug fixing of various applications reported by the testing team in the application during the integration
  • Designed Tables and indexes
  • Developed PL/SQL packages, procedures, functions to migrate the data from source to stage and stage to the targeting systems.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations by using SQL Server 2005.
  • Responsible for performing code reviews.

Environment: Java, J2EE, Eclipse, Weblogic Application Server, Oracle, JSP, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

We'd love your feedback!