We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Greenwood Village, CO

SUMMARY

  • Over 8+ years of experience in IT industry, including Big data environment, Hadoop ecosystem, Java and Design, Developing, Maintenance of various applications.
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
  • Expertise in core Java, JDBC and proficient in using Java API's for application development.
  • Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
  • Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
  • Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
  • Good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Development of spark-based application to load streaming data with low latency, using Kafka and Pyspark programming.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data.
  • Optimized Machine learning algorithms for applications with 2M+ user
  • Define overall data architecture and design.
  • Responsible for Data Warehousing with help of SSIS services.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and MapReduce open-source tools.
  • Employed the help of SSRS for its reporting services
  • Led and Managed Data Warehousing and Data Integration projects.
  • Architected an enterprise-wide consumer Analytics data warehousing to facilitate onboarding, cross-selling and consumer acquisition programs.
  • Experience in installation, configuration, supporting and managing Hadoop clusters.
  • Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Strong hands-on experience with AWS services, including but not limited to EMR, S3, EC2, route53, RDS, ELB, DynamoDB, CloudFormation, etc.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Sqoop, Oozie, Flume, Storm, big data technologies.
  • Worked on Sparks, Spark Streaming and using CoreSparkAPI to explore Spark features to build data pipelines.
  • Experienced in working with different scripting technologies like Python, UNIX shell scripts.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQLServer, Teradata and Netezza using Sqoop.
  • Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - MapReduce framework.
  • Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans
  • Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
  • Experience in installation, configuration, supporting and managing -Cloudera Hadoop platform along with CDH4&CDH5 clusters.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Experience in working with different data sources like Flat files, XMLfiles and Databases.
  • Experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages and triggers in Oracle.

TECHNICAL SKILLS

Hadoop/Bigdata: HDFS, MapReduce, Sqoop, Hive, PIG, HBASE, Zookeeper, Cluster configuration, FLUME, AWS

Distributions: Cloudera

Java Technologies: Core Java, JDBC, HTML, JSP, Servlets, Tomcat, JavaScript

Databases: SQL, NOSQL HBase, MYSQL, Oracle, PL/SQL.

Programming Languages: C, C++, Java, SQL, Shell, Python

IDE's Utilities: Eclipse

Web Technologies: J2EE, JMS, Web Service

Protocols: TCP/IP, SSH, HTTP and HTTPS

Scripting: HTML, JavaScript, CSS, XML and Ajax

Operating System: Windows, Mac, Linux and UNIX

IDE: Eclipse, Microsoft Visual Studio 2008, 2012, Flex Builder

Version control: Git, SVN, CVS

Tools: FileZilla, Putty, PL/SQL Developer, Junit

PROFESSIONAL EXPERIENCE

Sr. Big Data Engineer

Confidential, Greenwood Village, CO

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Lead the modern Data Architecture practice, deliver Big Data and Cloud Technologies Projects.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools.
  • Installed and configured Hive and written Hive UDFs and used repository of UDF's for Pig Latin.
  • Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Migrated the existing on-perm code to AWS EMR cluster.
  • Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
  • Applying machine learning libraries and algorithms optimize existing data.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
  • Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
  • Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using S3.
  • Used HBase NoSQL Database for real time and read/write access to huge volumes of data in the use case.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into HBase.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket.
  • Created spark jobs to apply data cleansing/data validation rules on new source files in inbound bucket and reject records to reject-data S3 bucket.
  • Research and develop state of the art techniques in the field of Machine Learning.
  • Created HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Created tables along with sort and distribution keys in AWS Redshift.
  • Create PySpark frame to bring data from DB2 to Amazon S3.
  • Created shell scripts and python scripts to automate our daily tasks (includes our production tasks as well)
  • Created, altered, and deleted topics using Kafka Queues when required with varying.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
  • Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Provide guidance to development team working on PySpark as ETL platform.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
  • Developed and maintained batch data flow using HiveQL and unix scripting.
  • Developed and execute data pipeline testing processes and validate business rules and policies.
  • Built code for real time data ingestion using MapR-Streams.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Automation of unit testing using Python. Different testing methodologies like unit testing, Integration testing.
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Involved in writing SQL queries to validate the data between source and target systems
  • Implemented different data formatter capabilities and publishing to multiple Kafka Topics.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.

Environment: Hadoop 3.0, MapReduce, Hive 3.0, Agile, HBase 1.2, PySpark, NoSQL, AWS, Kafka, Pig 0.17, HDFS, Java 8, Hortonworks, Spark, PL/SQL, Python

Big Data Architect

Confidential, Bothell, WA

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Implemented MapReduce programs to retrieve results from unstructured data set.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on and designed Big Data analytics platform for processing customer interface preferences and comments using Hadoop, Hive and Pig, Cloudera.
  • Importing and exporting data into HDFS and Hive using Sqoop from Oracle and vice versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for ETL and Data Architecture projects.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Worked on querying data using Spark SQL on top of PySpark engine.
  • Experienced in implementing POC's to migrate iterative MapReduce programs into Spark transformations using Scala.
  • Developed Spark scripts by using Python and Scala shell commands as per the requirement.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Experienced in AWS cloud environment and on S3 storage and EC2 instances
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Designed and implemented SOLR indexes for the metadata that enabled internal applications to reference Scopus content.
  • Used Spark for Parallel data processing and better performances using Scala.
  • Extensively used Pig for data cleansing and extract the data from the web server output files to load into HDFS.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using MapReduce programs.
  • Developed simple to complex MapReduce streaming jobs using Python.

Environment: Pig 0.17, Hive 2.3, HBase 1.2, Sqoop 1.4, Flume 1.8, zookeeper, AWS, MapReduce, HDFS, Cloudera, Scala, Spark 2.3, SQL, Apache Kafka 1.0.1, Apache Storm, Python, Unix.

Sr. Big Data Architect

Confidential, CA

Responsibilities:

  • Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Distributions like Hortonworks.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
  • Created Hive external tables on the MapReduce output before partitioning; bucketing is applied on top of it.
  • Gathered data requirements and design enhancements to the data warehouse.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Developed end to end architecture design on bigdata solution based on variety of business use cases.
  • Worked as a Spark Expert and performance Optimizer.
  • Member of Spark COE (Center of Excellence) in Data Simplification project at Cisco
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark
  • Handled Data Skewness in Spark-SQL
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data also developed Spark jobs and Hive Jobs to summarize and transform data
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark
  • Implemented Sqooping from Oracle to Hadoop and load back in parquet format.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data; Worked under Map Distribution and familiar with HDFS
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Designed and maintained Test workflows to manage the flow of jobs in the cluster.
  • Worked with the testing teams to fix bugs and ensured smooth and error-free code
  • Preparation of docs like Functional Specification document and Deployment Instruction documents
  • Fixed defects during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Involved in installing Hadoop Ecosystem components (Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Zookeeper and HBase).
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.

Environment: AWSS3, RDS, EC2, Redshift, Hadoop 3.0, Hive 2.3, Pig, Sqoop 1.4.6, Oozie, HBase 1.2, Flume 1.8, Hortonworks, MapReduce, Kafka, HDFS, Oracle 12c, Microsoft, Java, GIS, Spark 2.2, Zookeeper

Hadoop Developer/ Admin

Confidential, Dallas, TX

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Bigdata technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed full SDLC of AWS Hadoop cluster based on client's business need
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra)
  • Responsible for importing log files from various sources into HDFS using Flume
  • Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Responsible for designing and formulating the architecture, development, and engineering of Big Data solutions of a company.
  • Ability to install and deploy the Hadoop cluster, add and remove nodes, monitor tasks and all the critical parts of the cluster, configure name-node, take backups.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Responsible for keeping the Hadoop clusters running smoothly in production.
  • Developed predictive analytic using Apache Spark, Scala APIs.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Implemented Spark Graph application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV, JSON, parquet and HDFS files.
  • Developed Hive SQL scripts for performing transformation logic and loading the data from staging zone to landing zone and Semantic zone.
  • Involved in creating Oozieworkflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Managed and lead the development effort with the help of a diverse internal and overseas group.

Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Hadoop Developer/Admin

Confidential

Responsibilities:

  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using HortonWorks.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Involved in loading data from UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, experienced in managing and reviewing Hadoop log files.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Extracted meaningful data from dealer csv files, text files, and mainframe files and generated Python panda's reports for data analysis.
  • Developed python code using version control tools like GIT hub and SVN on vagrant machines.
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Environment: Horton Work, Hadoop, HDFS, Pig, Hive, Sqoop, Flume, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL

Big Data Developer

Confidential

Responsibilities:

  • Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production
  • Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
  • Established custom Map Reduces programs to analyze data and used Pig Latin to clean unwanted data.
  • Installed and configured Hive and wrote Hive UDF to successfully implement business requirements.
  • Involved in creating hive tables, loading data into tables, and writing hive queries those are running in MapReduce way.
  • Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc. in Hive tables.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Experience in working with Spark SQL for processing data in the Hive tables.
  • Developing Scripts and Tidal Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie.
  • Involved in writing test cases, implementing unit test cases.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Hands on experience with Accessing and perform CRUD operations against HBase data using Java API.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Developed spark applications using Scala for easy Hadoop transitions.
  • Extensively used Hive queries to query data according to the business requirement.
  • Used Pig for analysis of large data sets and brought data back to HBase by Pig

Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, MySQL and Ubuntu, Zookeeper, CDH3/4 Distribution, Java Eclipse, Oracle, Shell Scripting.

We'd love your feedback!