Hadoop Developer Resume
Chicago, IL
SUMMARY
- Over 7+ years of experience in IT industry with experience in Database development and Big data technologies.
- 4+ years working experience on Big Data technologies andHadoopstack.
- Experience working with HDFS, MapReduce, Spark. Hive, Pig, SQOOP, Flume, Kafka, Oozie and HBase.
- Good exposure to performance tuning hive queries, Pig Scripts and SQOOP.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them. Extensively worked on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in NOSQL databases like HBase and MongoDB.
- Having good knowledge of Oracle 8i, 9i, 10g as Database and excellent in writing the SQL queries
- Experience working in Oracle, DB2, SQL Server and My SQL database.
- Experience in SQL Server Import/Export wizard to migrate the heterogeneous databases such as Oracle and MS Accessdatabase, excel, flat files to SQL server.
- Experience in up gradation of Data bases in 10g and 11g.
- DatabaseRefresh from Production to Development and from Development to Test environments.
- Replication of tables to cross platform and Creating Materialized Views.
- Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
- Analytical, organized, enthusiastic to work in a fast paced and team oriented environment. Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.
- Good exposure in Software Development Life Cycle.
- Excellent communication and inter - personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
- Proactive in time management and problem-solving skills, self-motivated and good analytical skills.
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
TECHNICAL SKILLS
Primary skills: Java, C, Python, Scala, Shell Scripting
Hadoop Technologies: Hive, Pig, Spark, Impala, Kafka, SQOOP, HDFS, MapReduce, OOZIE, FLUME, Zookeeper, YARN
No SQL Databases: MongoDB, HBase
Hadoop Distribution: Hortonworks, Cloudera, MapR
Databases: Oracle 10g, MySQL, MSSQL
IDE/Tools: Eclipse, NetBeans, Maven
Version control: GIT, SVN, CLEARCASE
Platforms: Windows, Unix, Linux
Web/Server Application: Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server
PROFESSIONAL EXPERIENCE
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
- Implemented Enterprise Data Lake which provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
- Involved in moving legacy data from Sybase ASE data warehouse toHadoopData Lake and migrating the data processing to lake.
- Used SQOOP import and export functionalities to handle large data set transfer between Sybase database and HDFS.
- Created reconciliation jobs for validating data between source and lake.
- Created Hive refiners for simple UNIONs and JOINS.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Worked in tuning Hive and Pig scripts to improve performance.
- Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
- Used Scala to test Data frames transformations and debugging issues with data.
- Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
- Created Java based Spark refiners to replace existing SQL Stored Procedures.
- Used Avro format for staging data and Parquet for final repository.
- Implemented Daily jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Used REST services in Java and Spring to expose data in the lake.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
- Used Eclipse and Ant to build the application.
- Performed unit testing and integration testing using Junit framework.
- Configured build scripts for multi module projects with Maven and Jenkins.
- Added AppDynamics monitoring to JVM to gather statistics for REST application.
- Worked on the data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to java web services and which has been accessed by the end users.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Horton Works, Map Reduce, Pig, Hive, Spark, Scala, Oozie, SQOOP, Sybase, Kafka, Linux, Maven, Junit, SVN with Talend.
Confidential
Hadoop Developer
Responsibilities:
- Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on Data importing and exporting into HDFS and Hive Using SQOOP.
- Ingesting Log data from various web servers into HDFS using Apache Flume.
- Implemented Flume Agents for loading Streaming data into HDFS.
- Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
- Written several Map reduce Jobs using Java API.
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL toHadoop.
- Responsible to Create Hive Tables, Load data into them and to write Hive queries.
- Performing Data transformations in HIVE.
- Written Hive queries to perform Data Analysis as per the Business Requirements.
- Created partitions and buckets on hive tables to improve performance while running Hive queries.
- Optimizing and performance tuning of Hive Queries.
- Implementing Complex transformations by writing UDF's in PIG and HIVE.
- Loading and Transforming all kinds of data like Structured, semi-structured, and Unstructured data.
- Scheduled jobs using Oozie workflow Engine.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Experienced in working with data analytics, web Scraping and Extraction of data in Python.
- Designed & Implemented database Cloning using Python and Built backend support for Applications using Shell scripts.
- Worked on various compression techniques like GZIP and LZO.
- Design and Implementation of Batch jobs using SQOOP, MR2, PIG, Hive.
- Implemented HBase on top of HDFS to perform real time analytics.
- Handled Avro Data files using Avro Tools and Map Reduce.
- Developed Data pipelines by using Chained Mappers.
- Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
- Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: HDFS, Map Reduce, Hive, Flume, Pig, Spark, Scala, SQOOP, Oozie, HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.
Confidential
Hadoop Developer
Responsibilities:
- Worked on Big DataHadoopcluster implementation and data integration in developing large-scale system software.
- Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth.
- Capturing data from existing databases that provide MySQL interfaces using SQOOP.
- Worked extensively with SQOOP for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JMS, JBoss and Web Services.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into theHadoopDistributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewedHadooplog files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration ofHadoop, Hive and Pig.
- Developed Hive queries for the analysts, used ETL tool Talend for processing and further did visualization for transactional data.
- Helped business processes by developing, installing and configuringHadoopecosystem components that moved data from individual servers to HDFS.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Handling structured and unstructured data and applying ETL processes.
Environment: Hadoop, HDFS, Hive, Cassandra,Hortonworks, IBM DataStage 8.1(Designer, Director, Administrator), MySQL, Windows, Linux
Confidential
SQL ServerDatabaseAdmin
Responsibilities:
- Created and maintained various databases for production, development and testing environments using SQL Server 2008R2, 2005.Migrated SQL 2005 to SQL 2008R2.
- Performed daily backups and developed recovery procedures.
- Setup Active-Active/Active-Passive clustering in production and testing environments and troubleshot issues. Applied hot fixes and service packs in clustering environments.
- Conducteddatabasemirroring in high performance and high availability mode and troubleshot issues with witness servers.
- Performed administration,databasedesign, performance analysis, and production support for large (VLDB) and complex databases, up to 2 Terabytes.
- Migrated data across different databases and different servers utilizing SSIS, BCP and import/export tools.
- Executed performance tuning utilizing windows performance monitor and SQL profiler of SQL Server. Utilized SQL Server error logs and event viewer to troubleshoot issues.
- Implemented and configureddatabasemonitoring tools and focused on providing a high availabilitydatabase.
- Troubleshot issues with log file full anddatabasesize increasing beyond threshold.
- Provided Change Control Management by identifying and controlling change requests to a project from both internal and external sources. Provided support to applicationdeveloperson application databases.
- Manageddatabasecomponents of application releases and 24/7 production issues.
- Managed SSIS packages, stored procedures, jobs, and utilized Server-Side tracing to identify performance issues with deadlocks & other locking issues.
- Performeddatabaseconsistency checks with DBCC, Defrag, Index tuning and monitored error logs.
- Developed few reports utilizing MS SQL Reporting Services.
- Createddatabaseanddatabaseobjects such as tables, stored procedures, views, triggers, rules, defaults, user defined data types and functions.
- Performs refreshes from production backup to stage and development environments.
- Developed DR plans and automated failsafe backup remediation modules for accurate backup monitoring and reporting/Ad Hoc reporting.
- Collaborated with development team for performance enhancements, code reviews and provided vital inputs in terms of index optimization, schema structures, data changes, file layouts, etc.
- Extracted data from various sources such as SQL Server 2005/2008, Oracle, Teradata, CSV, Excel and Text file from client servers and through FTP.
- Developed and deployed stored procedures, created triggers, and audits.
- Developed jobs for maintenance by updating statistics, rebuilding indexes, scheduling full/diff/log backups, and refreshingdatabaseenvironments. Set up notifications for failure of jobs, etc.
- Determineddatabaseschema,databaseschema changes, and implementation plan fordatabase schema releases in coordination withdevelopersand data modelers in adherence with organization policies.
- Monitored, determined, and solved issues at SQL/database/server levels.
Environment: Windows 10/8/7, MS-SQL Server 2008R2, Visual Studio, i3 Performance Tool, SSIS, SSRS, Sites Scope Monitoring Tool, IDERA Tool, VMware Machines, Visual Studio, HTML, XML
Confidential
Jr. PL/SQLDatabase Programmer
Responsibilities:
- Developing packages, procedures, functions, and triggers for the application.
- Using UNIX environment for performing the testing.
- Writing PL/SQLcode using the technical and functional specifications.
- Creating Oracle objects like tables, types, packages, procedures and functions.
- Writing technical documents using the functional specifications.
- Performing the tests on the newly coded procedures and documenting the same.
- Automated sqlldr using shell scripting.
- Automated data fetch using UNIX shell script.
- Optimized the queries to improve the performance of the application.
Environment: Oracle, SQL* Plus, TOAD,SQL*Loader, Forms & Reports and Windows 2000 professional, UNIX shell scripting.
