Big Data Engineer Resume
Seattle, WA
SUMMARY
- Having 8 years of Professional experience in IT Industry using wide variety of technologies in all phases of the development life cycle. Expertise in MSBI /Big data technologies as an engineer, proven ability in project based leadership, teamwork and good communication skills.
- Working as a Big Data architect for providing solutions for big data problem
- Extensively worked onHadoopframework with Horton work - HDP 2.2, HDP 2.3, HDP 2.4, HDP 2.5.
- Experience working in Hadoop -as-a-Service (HaaS) environments with Sync Sort-DMX-H, subversion (SVN), and SQL and NoSQL databases
- Understanding to identify the viability of a business problem for a big data solution. Defining a logical architecture of the layers and components of a big data solution like data capacity planning and node forecasting. Selecting the right products to implement a big data solution.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Experience in architecting, designing, installation, configuration and management of ApacheHadoop clusters & ClouderaHadoop Distribution
- Excellent knowledge in Hadoop Architecture, HDFS Framework and eco system like MapReduce (MR), Hive, Zookeeper, Pig, HBase, Sqoop, Oozie, Flume, Spark for data extraction, storage and analysis.
- Solid understanding of Hadoop MR V1 and MR V2 (or) YARN Architecture.
- Extensively worked on Hive for ETL Transformations and optimized Hive Queries
- Worked with relational database systems (RDBMS) such as MySQL, MS SQL, Oracle and NoSQL database systems like HBase and Cassandra. Extensively experience in working on IDEs like Eclipse, Net Beans, Edit Plus.
- Involved in analysis, database design and development, implementation of BI, client/server and enterprise applications using SSIS, SSAS and SQL Server.
- Extensive knowledge on Data warehousing, OLAP, Dimensional Data Modeling for FACT and Dimensions Tables using Analysis Services.
- Experienced in Calculating Measures and Members in SQL Server Analysis Services (SSAS) using multi-dimensional expression (MDX), Mathematical Formulas.
- Skillful at Data Transformations, Tasks, Containers, Sources and Destinations like Derived Column, Conditional Split, Sort and Merge Join Transformations to load data into Data Warehouse
- Excellent experience in Database Design and Data Modeling (both OLTP & OLAP)
- Using all kinds of SQL Server Constraints (Primary Keys, Foreign Keys, Defaults, and Check, Unique etc.) and Generating Complex Transaction SQL (T-SQL) Queries and Sub queries.
- An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.
- Experience with Software Development Processes & Models: Agile, Waterfall & Scrum Model.
- Develop user and customer stories dat are compatible with technical requirements, Cherwell CSM platform, and ITIL best practices to fully configure IT service management tool
- Experience in all aspects of source control, configuration management, software lifecycle processes.
- Extensive knowledge in designing, configuring Filters, Escalations, One Steps, Automation Process according to business requirements.
- Efficient in interacting with business requirements teams and provide necessary support to executive staff and ability to work independently.
TECHNICAL SKILLS
Programming Languages: C, JAVA (J2SE), MapReduce, PIG, Hive QL, SQL, CSS, HTML
Databases: NoSQL, SQLServer, MySQL, Oracle, Teradata, TOAD DataPoint
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper,Hbase, SPARK
HDP Distributions: Cloudera, Horton Works
ETL Tool: SQL Server Integration Services (SSIS)
OLAP Tool: SQL Server Analysis Services (SSAS)
Reporting Tools: SQL Server Reporting Services (SSRS), Tableau
Monitoring Tools: OOZIE, Ambari, Nagios, Ranger
ITSM Tool: Cherwell,Remedy,Peregrine
Application: Microsoft Office Suit (Word, Excel, Power Point, Visio)
Operating Systems: Linux (Ubuntu, Centos), UNIX, Windows
PROFESSIONAL EXPERIENCE
Confidential | Seattle, WA
Big data Engineer
Responsibilities:
- Gathering data from multiple sources like Teradata, Oracle and SQL Server using Sqoop and loading to HDFS
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Worked with clients to better understand their reporting and dashboard needs and present solutions using structured Waterfall and Agile project methodology approach.
- Understanding the business requirements and needs and drawing the road map for Big data initiatives.
- Coordinate with different teams with user issues and resolve it.
- Day to day responsibilities includes solving developer issues, deployments, moving code from one environment to another environment, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
- Understanding the data nature from different OLTP systems and designing the injection processes for HDFS
- Planned and prepared the use case for new Hadoop services and tested on sandbox by adding/installing using Ambari manager.
- Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
- Performed Rolling and express upgrade from HDP 2.3.2 to HDP 2.4.3.
- Set up and manage High Availability Name node, Resource Manager, Hiveserver2, Hive Metastore, Oozie to avoid a single point of failures in large clusters.
- Created HIVE databases and granted appropriate permissions through Ranger policies.
- Designing data model on Hbase and Hive
- Partitioning and Bucketing techniques in hive to improve the performance.
- Optimizing Hive and Hbase queries.
- Designing HBase column schemas.
- Creating common data interface for Pig and Hive using HCatalog.
- Developed various data connections from data sourced to SSIS, and Tableau Server for report and dashboard development.
- Identify data sources, create source-to-target mapping, storage estimation, provide support forcluster setup, data partitioning.
- Developed scripts for data ingestion using Sqoop and Flume, Spark SQL and Hive queries for analyzing the data, and Performance optimization
- Wrote DDL and DML files to create and manipulate tables in the database
- Responsible for cleansing and validating data.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Partitioning, Dynamic Partitions, and Buckets in HIVE for increasing performance and data organization.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
- Analyzed data using Hadoop components Hive and Pig and created tables in hive for the end users
- Evaluated business requirements and prepared detailed specifications dat follow project guidelines required to develop written programs.
- Devised procedures dat solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked hands on with ETL process.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Wrote REST Web services to expose the business methods to external services.
- Exported the patterns analyzed back into Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through ambari Manager.
- Installed Oozie workflow engine to run multiple Hive.
- Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs dat extract the data on a timely manner.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Confidential | Hopkins, MN
Hadoop Developer
Responsibilities:
- Experience on implementing pipeline for the data ingestion to Hadoop.
- Work with Business stakeholder and translate Business objectives, requirements into technical requirements and design.
- Design, architect, and halp Maintain scalable solutions on the big data analytics platform for Pharmacy.
- Created and maintained technical documentation for launching HDP clusters and for executing Hive queries and Pig Scripts
- Performed data ingestion using Flume, Pig, Hive, Sqoop and Oozie.
- Designed a data warehouse using Hive, to create and manage Hive tables(external/ORC)
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Create a table inside RDBMS, insert some data after load the same table into HDFS, Hive using Sqoop.
- Developed MapReduce jobs which are involved in data processing.
- Experienced in working with Sqoop imports and exports between RDBMS and HDFS.
- Automated the jobs for daily, weekly and monthly based on the requirement using Oozie.
- Implemented Hive UDFs in solving and improving the performance.
- Experienced working on parquet files in impala and implemented multiple batch queries in impala.
- Experienced shell scripting and rewriting already existing python modules to deliver particular format of data.
- Worked with Teradata analysis team using Big Data technologies to gather the business requirements.
- Worked on running the Hadoop streaming jobs in order to process terabytes of xml format data.
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop
- Created external Hive tables and worked on loading the data into Hive tables.
- Experienced in loading and transforming large datasets of structured, semi-structured and unstructured data.
- Responsible for extraction of files form multiple databases using Sqoop.
- Experienced on streaming data using Kafka along with Spark framework.
Confidential | Overland Park, KS
Hadoop Developer
Responsibilities:
- Gathering business requirements from the Business Partners and Subject Matter Experts.
- Importing and exporting data using Sqoop between HDFS and SQL databases based on the Business need. Developed custom MapReduce jobs in java for preprocessing and data cleaning.
- Creating Hive tables and Written Hive queries for data analysis to meet the business requirements.
- Responsible for loading data from UNIX file systems to HDFS
- Involved in creating HiveQL Tables, loading with data and writing HiveQL queries which will invoke and run MapReduce jobs in the backend
- Developed Hive queries to aggregate the click-stream data dat was imported into HDFS using Sqoop.
- Developed PIG scripts which can perform multiple aggregations on a single data set.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in daily SCRUM meetings to discuss the development/progress of Sprints and was active in making scrum meetings more productive.
Confidential | Overland Park, KS
Hadoop Developer
Responsibilities:
- Responsible for loading data from UNIX file systems to HDFS
- Involved in running Hadoop jobs for processing data coming from different sources.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
- Loaded data into HDFS and extracted the data from SQL into HDFS using Sqoop
- Involved in creating HiveQL Tables, loading with data and writing HiveQL queries which will invoke and run MapReduce jobs in the backend
- Analyzed large and critical datasets using HDFS, Hbase, MapReduce, Hive, Sqoop and Zookeeper.
- Strong knowledge in writing the Hive scripts for ETL tool to do transformations, event joins, filter both traffic and some pre aggregations before storing into the HDFS.
- Exporting the results of transaction and sales data to RDBMS after aggregations and computations using sqoop.
- Worked on performance tuning process to optimize the hive scripts, pig scripts and map reduce programs.
- Worked with Admin team to tune the cluster level configurations.
- Created workflow and coordinator using Oozie for regular jobs and to automate the tasks of loading the data into HDFS
Confidential | Overland Park, KS
Senior ETL Developer
Responsibilities:
- Involved in creation/review of functional requirement specifications and supporting documents for business systems, experience in database design process and data modeling process.
- Extracted data from various sources like SQL Server 2008, .CSV, Excel and Text file from Client servers and through FTP.
- Handled Performance Tuning and Optimization on SSIS, with strong analytical and troubleshooting skills for quick issue resolution in large-scale production environments located globally.
- The packages created included a variety of transformations, for example look up, Aggregate, Derived Column, Conditional Split, Multicast and Data Conversion.
- Reviewed & Tested packages, fixing bugs (if any) using SQL 2008 Business Intelligence Development Studio.
- Developed stored procedures, User Defined Functions and triggers.
- Developed packages topull data from various data sources
- Involved in tuning stored procedures for better performance.
- Created Stored Procedures extensively using CTE (Common Table Expressions), Temp tables and Dynamic SQL Queries.
- Createdclustered and non-clustered indexes.
- Involved inscheduling and monitoringall Job activities.
- Granting access to users on Servers/DB/Applications.
- Worked in developing the design of physical database and its modification.
- Scripting C#.net in the script task for data conversions in SSIS packages
- Involved in databaseconsistency checks and index fragmentation.
- Monitored the performance ofDatabase Server & Tuningfor Better Performance.
- Responsible for the down streams to be up to date.
- Extensively usedjoins and sub queriesto simplify complex queries involving multiple tables
- Participated in the creatingtables,indexes, constraints, triggers, procedures, views, user defined data types and user defined functions.
Confidential | Overland Park, KS
ETL Developer
Responsibilities:
- Support existing DW platform/applications to ensure system availability and scalability.
- Work directly with functional analysts and business users in understanding the information needs of the business and involved in developing new data warehouse/BI functionalities or tools to meet the requirements.
- Involved in writingstored procedures andViewsfor business logic and functionality of various modules.
- Developed the package to handle the massive amount of data from files and DB and maintain the tracker status of the files thru job tables.
- Handled the errors in packages and also provided the error logging.
- Created the jobs for package automation using SQL Server Agent.
