Hadoop Developer Resume
Chicago, IL
SUMMARY:
- 8 Years of extensive experience including 3 years of Big Data and Big Data analytics on Ecommerce, Education Financials and Healthcare domains and 5 years on Development and Implementation of database applications using Oracle 10g/9i, SQL and PL/SQL.
- Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, SQOOP, and Impala.
- Having hands on experience in writing Map Reduce jobs in Hive, Pig.
- Having experience on importing and exporting data from different systems to Hadoop file system using SQOOP.
- Using Hadoop ecosystem components for storage and processing data, exported data into Tableau using Live connection.
- Having experience on creating databases, tables and views in HIVEQL, IMPALA and PIG LATIN.
- Strong knowledge on Map Reduce concepts, Around 1year experience on Spark and Scala.
- Hands on Experience in working with ecosystems like Hive, Pig, Map Reduce.
- Strong Knowledge of Hadoop, Hive and Hive analytical functions. Efficient in building map reduce programs using Hive and Pig.
- Involved in data migration to implement on Hadoop stack from different databases (SQL Server2008 R2, Oracle, and MYSQL).
- Successfully loaded files to Hive and HDFS from MYSQL.
- Loaded the dataset into Hive for ETL Operations. Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Good understanding of cloud configuration in Amazon web services (AWS). In - depth understanding of Data Structure and Algorithms.
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, Web Logic, IBM Web Sphere and Oracle Application Server.
- Strong Communication skills of written, oral, interpersonal and presentation. Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Extensive work experience with different SDLC approaches such as Waterfall and Agile development methodologies.
- Good communication and presentation skills. Ability to identify and resolve problems both independently and quickly.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP. Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Installed and configured Hadoop cluster in Test and Production environments.
- Performed both major and minor upgrades to the existing CDH cluster. Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Analyzing/Transforming data with Hive and Pig.
SKILL:
APACHE HADOOP HDFS (2 years), Hadoop (2 years), Hadoop Distributed File System (2 years), Oracle (6 years), SQL (8 years)
TECHNICAL SKILLS:
Big Data Hadoop Stack: HDFS, MRV2(YARN), SQOOP, Flume, PIG and Hive SPARK Spark Core, Spark Streaming, Spark SQL.
Machine Learning: Prediction, Classification, Clustering and Time series algorithms
NoSQL: HBase andMongoDB
Programming Language: Java, Python, R and Scala.
Analytics Tools: RStudio, Weka, Excel
RDBMS: MySQL, DB2 and Oracle
Reporting: Tableau, QlikView, D3JS and Excel
WORK EXPERIENCE:
Hadoop Developer
Confidential, Chicago, IL
Responsibilities:
- Managed and reviewed Hadoop log files. Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Understand clearly the business requirements of the client with respect to the risk rating modules and report modules.
- Working in the Cluster Setup 2-node and 5-node clusters with CDH3 distribution.
- Involved in the data prediction analysis using K-Mean algorithm.
- Coordinate discussions with customer and functional team as may be required to get various inputs.
- Work closely with the technology counterparts in communicating the business requirements. Application design and database design.
- Technical design document preparation.
Environment: Java, Machine learning, Cloud Era, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, SCALA, Git.
Hadoop Developer
Confidential, Chicago, IL
Responsibilities:
- To lead the Big Data Analytics solution project to load the data from Source all through into Client's Modern Analytics Platform.
- Analyze and Ingest Policy, Claims, Billing and Agency Data in Client's Solution which is done through multiple stages.
- Written multiple Map Reduce programs to extract data for extraction, transformation and aggregation from different sources having multiple file formats including XML, JSON, CSV &other compressed file formats.
- Assisted with data capacity planning and node forecasting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated the SQOOP jobs by scheduling in Oozie.
- Create Hive scripts to load data from one stage into another and implemented incremental load with the changed data architecture.
- The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Performed data analysis, queries on hive, pig on AMBARI(Hortonworks)
- Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
- Implemented Hive partitioning and bucketing to improve query performance in the Staging layer which is de-normalized form of the Analytics Model.
- Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
- Issued SQL queries via Impala to process the data stored in HDFS and HBASE.
- Plan and review the deliverables. Assist the team in their development & deployment activities.
- Involved in cluster setup meetings with the administration team.
Environment: Apache Hadoop 2.2.0, Hortonworks, MapReduce, Hive, Hbase, HDFS, PIG, Sqoop, Flume, Impala,Spark, Oozie, Kafka, MongoDB, UNIX, Shell Scripting, XML, JSON.
Hadoop Developer
Confidential, Oak Brook, IL
Responsibilities:
- Analyze large datasets to provide strategic direction to the company.
- Involved in analyzing the system and business.
- Developed SQL statements to improve back-end communications.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Involved in writing Hive queries to load and process data in Hadoop File System.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in working with Impala for data retrieval process.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
- Sentiment Analysis on reviews of the products on the client's website.
- Exported the resulted sentiment analysis data to Tableau for creating dashboards.
Environment: Cloudera, CDH4.3, Hadoop, Map Reduce, HDFS, Hive, MangoDB, SQOOP, MYSQL, SQL, Impala, Tableau.
Database Developer
Confidential, Northbrook, IL
Responsibilities:
- Write T-SQL statements and use local and global Temp Tables, Views, and CTEs to support data extraction
- Create Stored Procedures and Views to support reporting and interface data requirements
- Create SSIS Packages for interfaces from different end users and different departments, loading data into various output files
- Design daily/weekly/monthly SSRS reports utilizing multiple types of reports including sub reports, Drill Through/Drill Down, and Cascading Parameterized Reports
- Work with deployment environment at final stage and establish database connections to tables and datasets
- Provide migration and integration between SQL Server 2008 and 2012 under SQL Server Management Studio
- Involve in multiple Software Development Life Cycle (SDLC) phases, including requirement gathering from end users, providing requirement analysis, and designing data mapping documents
- Provide data cleansing including removal of duplicate data, conform data and time data type formatting, string editing, and data conversions
- Update queries and Stored Procedures for existing reports according to updated requirements, and provide maintenance for database environments
- Developed views which required complex joins and at the same time quick Response.
- Involved in writing complex queries for the project as required.
- Involved in loading flat files into database using SQL*Loader.
- Modified, Tested and Debug the PL/SQL Packages, Functions, Procedures and Triggers according to the requirements.
- Developed/modified stored procedures, functions, triggers, views and synonyms to implement the business logic.
Environment: Oracle 9i, Windows XP, SQL*Loader SQL Server 2008/2012, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), MS Office 2007/2013, Windows 8/7, Qlikview, Team Foundation Server.
Oracle PL/SQL Developer
Confidential, Chicago Heights, IL
Responsibilities:
- Analyzed Business requirements based on the Business Requirement Specification document.
- Loaded Data into Oracle Tables using SQL Loader.
- Create PL/SQL stored procedures, functions & packages for moving the data from staging area to database.
- Involved in generating numbers for PRIMARY KEY VALUES using Oracle SEQUENCE objects
- Performed extensive query analysis and tuning, indexes and hints and written numerous complex queries involving sub-queries, correlated queries, union/all, minus, inline SQL's, analytical function SQL's.
- Developed program specifications for PL/SQL Procedures and Functions to do the data migration and conversion.
- Created wide range of data types, tables, and index types and scoped variables.
- Designed the front end interface for the users, using Oracle Forms.
- Involved in database development by creating Oracle PL/SQL Functions, Procedures, Triggers, Packages, Records and Collections.
- Involved in development of ETL process using SQL* Loader and PL/SQL Package.
- Developed and customized Forms/Reports Using Oracle D2K.
- Designed Data layouts and Developer Reports using Oracle D2K.
- Implemented batch jobs (shell scripts) for loading database tables from Flat Files using SQL*Loader.
- Participated in Performance Tuning using Explain Plan.
- Created numerous of database Triggers using PL/SQL.
- Created UNIX shell and Perl scripts for data file handling and manipulations.
Environment: Oracle 9i/10g, SQL, PL/SQL, SQL*Plus, Oracle D2K, SQL*Loader.
SQL Developer
Confidential, Carbondale, IL
Responsibilities:
- Generated database SQL Scripts and deployed databases including installation and configuration
- Plan, design, and implement application database code objects, such as stored procedures and views.
- Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
- Provide database coding to support business applications using Sybase T-SQL.
- Perform quality assurance and testing of SQL server environment.
- Develop new processes to facilitate import and normalization, including data file for counterparties.
- Work with business stakeholders, application developers, and production teams and across functional units to identify business needs and discuss solution options.
- Ensure best practices are applied and integrity of data is maintained through security, documentation, and change management.
- Developed SQL Scripts to Insert/Update and Delete data in MS SQL database tables
- Experience in writing PL/SQL and in developing and implementing Stored Procedures
- Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
- Build data connection to the database using MS SQL Server
- Used different joins, sub queries and nested querying SQL query
- Worked with different sources such as Oracle, SQL and Flat files
- Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.
Environment: My SQL, SQL Server 2008(SSRS, SSIS), Visual studio 2000/2005, MS Excel.
