Hadoop Developer Resume
CaliforniA
PROFESSIONAL SUMMARY:
- 4+ years of hands on experience in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem.
- 4 years of experience on SQL data base systems.
- Excellent experience in exporting and importing data from RDMS to HDFS, HIVE TABLES, and HBase by using Sqoop.
- Worked with different file formats like JSON, XML, Avro data files and text files.
- Good understanding and knowledge of NOSQL data bases like HBase.
- Excellent experience in monitoring system, development and support related activities for Hadoop Java/J2EE Technologies.
- Experience in Hadoop in stand - alone, pseudo and distributed modes.
- Have a good knowledge in cloud computing in AWS like EC2, S3, which provides fast and efficient processing of big data.
- Used PIG latin scripts, join operations, custom user defined functions (UDF) to perform ETL operations.
- Experienced in working with Map Reduce design patterns to solve complex Map Reduce programs.
- Extensive experience in writing custom Map Reduce programs for data processing and UDFs for both HIVE and PIG in JAVA.
- Have good experience in analyzing large amounts of data sets writing PIG SCRIPTS and HIVE queries.
- Worked on custom PIG LOADERS and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
- Ability to transform complex business requirements into technical specifications.
- Data processing, Big Data Analytics, Installation, Configuration & Testing, Product Design & Development.
- System Architecture Support, Client, Relationship Management, Project Management, Quality Assurance.
- Research Reporting & Documentation, Leadership & Team Management, Strategy, Service-Oriented Architecture.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Hive, Flume, Map Reduce, Spark, Pig, HBase, Cassandra, Oozie, Kafka, AWS, Zookeeper, Impala.
Languages: Python, Java, Nodejs, HTML, CSS, JavaScript
Software: Apache, Nginx
Database: MySQL, MongoDB, Neo4J
PROFESSIONAL EXPERIENCE:
Confidential, California
Hadoop Developer
Responsibilities:
- Involved in complete SDLC life cycle of Big Data project that includes requirement analysis, design, coding, testing and production.
- Extensively used Sqoop to import/export data between RDBMS and HIVE tables, incremental imports and created Sqoop jobs for last saved value.
- Involved in implementing the solution for data preparation which is responsible for data transformation as well as handling user stories.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
- Prepared Pig Scripts that were used to build renormalized JSON documents which were then loaded in Elastic Search
- Created Hive target tables to hold the data after all the PIG ETL operations using HQL.
- Hands on experience with Accessing and perform CURD operations against HBase data.
- Written shell scripts to automate the process by scheduling and calling the scripts from scheduler.
- Wrote rules in hive to predict members with various ailments and their primary care providers and reports are pushed to Elastic Search.
- Closely collaborated with both the onsite and offshore team
- Participate in IT sprint planning sessions and project prioritization sessions when applicable
- Responsible for the design, development and implementation of mappings /transformations based on source-to-target specifications, defining ETL(extract, transform, load) development standards and conventions.
- Worked with team of developers designed, developed and implement a BI solution for Sales, Product and Customer KPIs.
- Worked with business analyst to identify and understand source data systems.
Confidential New York
SQL Developer
Responsibilities:
- Write complex SQL queries and optimizing them to pull the required data in an effective way.
- Automated several critical reports and created reusable utilities using Unix Shell Scripting to reduce the number of repetitive requests from users.
- Coordinate with partners both inside and outside the company, understand the business need, perform the required data analysis and provide valuable insights.
- Created an end to end automated process to run each unit through a rule engine before shipping to double check the quality of the products, which is very critical to the business.
- Ad hoc data analysis involving extensive data profiling.
- Provide data visualizations to the users using Tableau.
- Develop and implement data collection systems and other strategies that optimize statistical efficiency and data quality.
- Interpret data, analyze results using statistical techniques and provide ongoing reports.
- Acquire data from primary or secondary data sources and maintain databases/data systems.
- Identify, analyze, and interpret trends or patterns in complex data sets, Work closely with management to prioritize business and information needs Enabled Tableau actions to drill down from Dashboard to worksheet.
- Actively participated in clarification meeting with the Clients.
- Involved in testing the SQL Scripts for report development, Tableau reports, Dashboards, Scorecards and handled the performance issues effectively.
- Tested dashboards to ensure data was matching as per the business requirements and if there were any changes in underlying data.
- Exceptional ability to build productive relationships with colleagues.
- Involved in complete Software Development Life Cycle (SDLC) process by analyzing businessrequirements and understanding the functional work flow of information from source systems to destination systems.
- Developed Tableau visualizations and dashboards using Tableau Desktop.
- Worked with business analyst to identify and understand source data systems.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Created complex SSIS packages to extract data and scheduled Jobs to call the packages using SQLserver agent.
- Analyze Data and generate ad-hoc report on SQL Database and prepared user manuals and provide training to end-users in open item section which include all the up to date open items and corresponding answers for the business.
- Approve, schedule, plan, and supervise the installation and testing of new products and improvements to computer systems such as the installation of new databases.
- Designed store procedures using dynamic SQL.
Confidential, PA
Big Data Analyst
Environment: Eclipse, Servlets, JSPs, HTML, CSS, JavaScript, JQuery, SQL, JDBC
Responsibilities:
- Involved in design and development phase of the Software Development Life Cycle (SDLC).
- Involved in functional requirements review and creating technical design documents and Integration Solution Design documents.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Managing and scheduling Jobs on a Hadoop cluster.
- Installed and configured Hive and also written Hive UDFs.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Automated script to monitor HDFS and Hbase through cronjobs.
- Managing and scheduling Jobs on a Hadoop cluster.
- Prepare multi-cluster test harness to exercise the system for performance and failover.
- Develop high-performance cache, making the site stable and improving its performance.
- Create a complete processing engine, enhanced to performance.
Confidential
SQL Developer
Environment: MySQL and Ubuntu, Oracle, Shell Scripting, Elastic Search.
Responsibilities:
- Worked with team of developers designed, developed and implement a BI solution for Sales, Product and Customer KPIs.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau
- Managed to learn and apply new technologies quickly.
- Created complex SSIS packages to extract data and scheduled Jobs to call the packages using SQLserver agent.
- Approve, schedule, plan, and supervise the installation and testing of new products and improvements to computer systems such as the installation of new databases.
- Overcame challenges of storing & processing voluminous structured/semi-structured data via BD Hadoop Framework
- Transferred data into HDFS & analyzed user activities on the platform to screen top-rated links via Hive & MapReduce
- Converted semi-structured XML data into a structured format to enable further processing using MapReduce
- Deployed Pig for data fragmentation into Category-based & Ratings-based & executed Hive queries for further analysis
- Delivered the output into RDBMS via Sqoop & achieved real-time processing of the website on Python-based server.
- I’ve experience working in WAMP (Windows, Apache, MySQL and Python) and LAMP (Linux, Apache, MySQL and Python) architecture. Working on various applications using python integrated IDEs Eclipse, Pycharm, and Net Bean.
- Performed code reviews and implemented best Pythonic programming practices. Good experience in handling errors/exceptions and debugging the issues in large scale applications.
- Knowledge in Create-Modify-Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, and Databases.
- I’ve worked Data migration from RDBMS to Hadoop and AWS cloud. Good knowledge on Spark platform parameters like memory, cores and executors. Hands on experience in AWS Cloud in various AWS services such as Red shift, IAM, EMR, EC2, S3, RDS, DynamoDB, Data Pipeline, Lambda.
Confidential
Hadoop Developer
Environment: Windows, Microsoft VB 6.0, Hive, Spring, Map Reduce, PGSQL, Pig, Sqoop, Hadoop, SQL Server.
Responsibilities:
- Overcame challenges of storing & processing voluminous structured/semi-structured data via BD Hadoop Framework.
- Transferred data into HDFS & analyzed user activities on the platform to screen top-rated links via Hive & MapReduce.
- Converted semi-structured XML data into a structured format to enable further processing using MapReduce.
- Deployed Pig for data fragmentation into Category-based & Ratings-based & executed Hive queries for further analysis.
- Delivered the output into RDBMS via Sqoop & achieved real-time processing of the website on Python-based server
- Analyzed datasets containing 70 lakh+ entries for customer complaints to effectively enhance the rate of issue resolution
- Deployed Hadoop framework to analyze customer attributes and fragment complaints based on client requirements
- Filtered the dataset to categorize complaints on the basis of product, location, rate of resolution, etc.
- Deployed Pig, Hive & MapReduce for data processing & Flume to transfer log files from multiple sources to HDFS
- Designed an application to track a patient's physical activities via multiple sensors across the Enabled key stakeholders to predict the quality of stocks for optimizing gains & bringing clarity to the broad-level vision.