Sr. Data Engineer Resume Bethesda, MD - Hire IT People

SUMMARY

Over 8+ years of professional IT experience in variety of industries, which includes hands on experience of Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
Experience in different Hadoop distributions like Cloudera(CDH), Hortonworks(HDP), Elastic MapReduce(EMR).
Hands on experience in developing predictive models by using machine learning.
Implemented various machine learning techniques like Random forest, k - means, logistic regression for predictions and pattern identification using Spark-MLib.
Involved in performing the Linear Regression using Scala API and Spark.
Responsible for building Hadoop clusters with Hortonworks/Cloudera Distribution and integrate with Pentaho Data Integration (PDI) server.
Extensively worked on various machine learning algorithms, used nltk a natural language processing(NLP) library in Python to build models.
Good Knowledge of Deep learning, Neural Networks, Convolutional Neural Networks (CNN).
Extensive experience in Spark/Scala, Pyspark, MapReduce(MRv1) and MapReduce MRv2(YARN).
Involved in creation and designing of data ingest pipelines using technologies such as Apache Kafka.
Used Kafka to load data into HDFS and move data into NoSQL databases.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Experience working on Spark SQL and Data frames for faster execution of Hive queries using Spark SQLContext.
Involved in converting MapReduce jobs into transformations and actions using Spark RDDs and Spark Data frames.
Experience in creating Pig and Hive UDFs in order to analyze the data efficiently.
Hands-on experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Talend/Informatica ecosystems.
Experience with Sequence files, AVRO, ORC, parquet file formats and gzip, snappy, bz2 compressions.
Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm - Kafka.
Integrated Apache Storm with Kafka t perform web analytics. Uploaded clickstream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
Experience in importing and exporting data using Sqoop from HDFS to relational database systems and vice-versa.
Strong experience in working with Elastic MapReduce and setting up environments on Amazon AWS EC2 instances.
Hands on NoSQL database experience with HBase and Cassandra.
Installation of Solr and configuring Solr Indexing of near real-time data.
Experience using CQL to execute queries on data persisting in the Cassandra cluster.
Involved in processing of data using Apache Tez and storing it to Cassandra.
Extensively worked on MongoDB concepts like locking, transactions, indexes, sharding, replication and schema design.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed it.
Developed core search components using Apache Solr.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts
Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop, Mahout, Zookeeper
Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Experience in building, maintaining multiple Hadoop clusters (prod, dev etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
Loading data from different source databases and files into Hive using Talend tool.
Experience creating reports and building dashboards using Tableau.
Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
Experience in optimization of Mapreduce algorithm using combiners and partitioners to deliver the best results.
Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations
Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
Followed Test driven development of Agile, Water Fall and RUP Methodology to produce high quality software.
Expertise in developing distributing business applications using EJB implementing Session beans for business logic, Entity beans for persistence logic and Message driven beans for asynchronous communication.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Involved in installing and configuration of distribution systems as a Hortonworks Distribution (HDP)and worked on full SDLC as an agile methodology.
Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedures PL/ SQL, Packages and Triggers in Oracle and SQL Server on Windows and UNIX.
Worked on different OS like UNIX/Linux, Windows NT, Windows XP, and Windows 2K.

TECHNICAL SKILLS

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE

Confidential, Bethesda, MD

Sr. Data Engineer

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on them.
Performed custom aggregate functions using Spark SQL and performed interactive querying.
Co-ordination with Hortonworks, development and the operations team on the platform level issues.
Extensively worked on creating combiners, partitioning, distributed cache to improve performance of MapReduce jobs.
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
Used Sqoop transfer data between databases and HDFS and used Kafka to stream the log data from servers.
Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
Implemented different analytical algorithms using MapReduce programs to apply on top of HDFS data.
Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
Implemented read references in MongoDB replica set.
Used Apache Tez for processing data and storing it in MongoDB.
Familiar with MongoDB write concern to avoid loss of data during system failures.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Extensively performed CRUD operations like put, get, scan, delete, update etc., on HBase database.
Wrote Hive Generic UDF’s to perform business logic operations at table level.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and preprocessing with Pig, Hive, Sqoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Used Hive join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
Worked on various file formats and compression Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
Implemented test scripts to support test driven development and continuous integration.
Scheduling cron jobs for file system check using fsck and wrote shell scripts to generate alerts.
Data scrubbing and processing with Oozie.
Loading the analyzed Hive data into NOSQL databases like Hbase, MongoDB.
Provide Technical support for the Research in Information Technology program
Manage and upgrade Linux and OS X server systems.
Responsible for installation, configurations and management for Linux Systems

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Python, Spark, Impala, Scala, Kafka, Shell Scripting, Eclipse, Cloudera, MySQL, Talend, Cassandra

Confidential, MD

Big Data Engineer

Responsibilities:

Plan, design and launch solution for building Hadoop cluster on cloud by using EMR and EC2 of AWS.
Converted Mapreduce jobs into transformations and actions using Spark RDDs and Spark Dataframes, Datasets.
Responsible for writing Apache Pig scripts and Hive queries for data quality analysis.
Used Flume to retrieve data from many sources into Hadoop Distributed File System(HDFS).
Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of unstructured data from logs into HDFS using Flume
Collected and aggregated large amounts of log data using Apache Flume and storing in into HDFS for future analysis.
Developed core search component using Apache Solr.
Installation of Solr and configuring Solr for Indexing of near real-time data.
Developed Spark-Cassandra connector to load data to and from Cassandra.
Worked with CQL to execute queries on data persisting in the Cassandra cluster.
Designed Spark applications in Scala and Python to interact with data stored in HDFS using SQLContext and access Hive tables using HiveContext.
Used Impala query engine to write queries to get faster results.
Defined job workflows as per dependencies in Oozie.
Developed the warehouse specific DataLake using Hive and Pig scripting and ETL Talend pipelines for populating the DataMarts for user/business consumption using Hive/Impala and Spark.
Experience in managing and reviewing Hadoop log files.
Migrated historical data from existing warehouses to Hadoop using Sqoop for scalable processing of the data and the eventual insights are sqooped back.
Worked on Talend to run ETL jobs on the data in HDFS.
Built services, deployed models, algorithms, performed model training and provided tools to make our infrastructure more accessible.
Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Responsible for Linux System Administration, DevOps, AWS Cloud platform and its features.
Implemented Elastic Search to decrease query times and increase search capabilities.
Extensively used S3 to store data and deployed EC2 instances using Elastic MapReduce(EMR) to perform analysis.
Configured Virtual Private Cloud(VPC) which includes various subnets for different teams to deploy their own clusters and increase or decrease the number of instances depending on the need.
Support Data Analysis projects using Elastic MapReduce on the Amazon Web Services(AWS) cloud
Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Started using Apache NiFi to copy the data from local file system to HDP.
Scheduled data loading from multiple sources into Redshift using Kinesis Stream.
Use copy and unload data to/from Redshift database between In-premises and AWS.
Designed Elastic Load Balancer(ELB) and launched in subnets to distribute network traffic to multiple instances.
Supporting Redshift Database using STL, SVL, STV, SVV system tables/views, unload into S3/ In-Premises, copy from PostgreSQL, schedule ELT from multiple sources using Kinesis Stream
Worked on several Amazon Web Services like EC2, ELB, VPC, S3, CloudFront, IAM, RDS, Route53, Cloudwatch, RedShift, SNS, SQS, SES, lambda to namely few.

Environment: Cloudera, HDFS, Hive, HQL scripts, Map Reduce, Java, Cassandra, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie.

Confidential, TX

Hadoop Developer

Responsibilities:

Involved in implementing Hadoop Cluster and data integration in developing large-scale system software.
Worked on analyzing Hadoop Distribution(HDP) and different Big Data analytic tools.
Worked on ORC File format, bucketing, partitioning for hive performance enhancement and storage improvement.
Setup security using Kerberos and AD on Hortonworks cluster.
Developed various python scripts to find vulnerabilities with SQL queries by doing SQL injection, permission checks and performance analysis.
Worked extensively with Sqoop for importing data.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Extensively used Pig for data cleansing.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Created queries in Hive to process large sets of structured, semi-structured and unstructured data and store in Managed and external tables and also created partition tables.
Experience with Sequence files, AVRO, ORC, parquet file formats and gzip, snappy, bz2 compressions.
Develop pig scripts to convert the data from Text file to Avro format.
Performed upgrades and configuration changes. Commissioned/decommission modes as needed on the go.
Evaluated usage of Oozie for Workflow Orchestration.
Supported and Monitored Mapreduce programs running on cluster and provide production support.
Used Oozie for fetching out data on the periodic basis and in periodic timely fashion.
Managed hadoop operations with multi-node HDFS cluster using Cloudera Manager.
Involved in ETL transformation of OLTP data to the Data Warehouse implementing all transformations using SSIL and SQL commands.
Created SSIS packages to extract data from OLTP to OLAP systems and scheduled jobs to call the packages and stored procedures.
Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
Involved in processing the data in the hive tables using HQL high-performance, low-latency queries.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL

Confidential

Java Developer

Responsibilities:

Involved in requirements gathering and analysis from the existing system. Captured requirements using Use Cases and Sequence Diagrams.
Designed physical and logical data model and data flow diagrams.
Analyzed and modified existing code wherever required and responsible for gathering, documenting and maintaining business and system requirements and developing designs document.
Developed Enterprise Java Beans (Session Beans) to perform middleware services and interact with DAO layer to perform database operations like update, retrieve, insert and delete.
Implemented Ant and Maven build tools to build jar and war files and deployed war files to target servers.
Used Rally tool for the development of Agile-lifecycle management creating the stories, updating the tasks and reporting the bugs.
Involved in schema design and XML page implementation.
Developed Message Driven Bean components with WebSphere MQ Series for e-mailing and Data transfer between client and the providers.
Created business classes depending upon the requirements.
Involved in developing interface for WEB pages like user registration, login and registered access control for users depending on logins using HTML, CSS and JavaScript/AJAX.
Analyzed data using complex SQL queries, across various databases.
As part of development, I was involved in gathering requirements.
Performed GitHub/GitHub-Desktop bash and terminal commands to clone, fetch, merge and push the code and created pull requests for changes that are made.
Involved in database design writing DDL and DML scripts.
Created several Exception classes to catch the error for a bug free environment and logged the whole process using log4j, which gives the ability to pinpoint the errors.
Used DB2 Database to store the system data
Involved in creating database objects like views, tables, procedures
Extensively used the advanced features of PL/SQL like Records, Tables, Ref Cursors, Object types and Dynamic SQL
Developing, implementing and unit testing of the application environment.

Environment: Java, J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

Used both WebLogic portal 9.2 for Portal development and WebLogic 8.1 for Data services programming.
Involved in gathering requirements from business users
Experience in Design and Development of database systems using Relational Database Management Systems including Oracle MS SQL Server and MySQL.
Upgradation of WebLogic servers in development, testing and production environment and applying patch and service packs.
Worked on creating EJBs that implements business logic.
WebLogic Administration, Monitoring and Troubleshooting using Admin Console and JMX and monitoring server health and service packs.
Involved in designing and development of the e-commerce site using JSP, Servlet, EJBs, JavaScript and JDBC.
Worked with data migration team, providing the mapping between the source and target systems.
Validated all forms using struts validation framework and implemented Tiles framework in the presentation layer.
Developed the Web Interface using Struts, JavaScript, HTML and CSS.
Developed JSP pages with Struts and EJB for implementing different search pages for transaction of each module.
Identified and implemented the user actions (Struts Action classes) and forms(Struts Form classes) as a part of Struts framework.
Involved in the design and coding of the data capture templates, presentation and component templates.
Designed intermediate database tables as per technical specifications.
Created web front end using JSP pages integrating AJAX and JavaScript coding that provide a rich browser based user interface.
Implemented database using SQL Server
Involved in bug fixing of various applications reported by the testing team in the application during the integration
Designed Tables and indexes
Developed PL/SQL packages, procedures, functions to migrate the data from source to stage and stage to the targeting systems.
Wrote SQL queries, stored procedures, and triggers to perform back-end database operations by using SQL Server 2005.
Responsible for performing code reviews.

Environment: Java, J2EE, Eclipse, Weblogic Application Server, Oracle, JSP, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, WSDL, SOAP, UDDI, ANT, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Bethesda, MD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship