Hadoop/ Big Data Engineer Resume
Minneapolis, MN
SUMMARY:
- 5 years of experience in software development specializing in Data Modeling, ETL, SQL, and NoSQL areas.
- Expert in working on all activities related to the development, implementation, administration and support of ETL processes for large - scale data warehouses using SQL Server SSIS and Talend.
- Hands on experience with Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Spark, Flume, Oozie and ZooKeeper).
- Well versed in configuring and administering the Hadoop Cluster using Cloudera.
- Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
- Experienced with performing real time analytics on No SQL databases like H Base and Cassandra .
- Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
- Strong understanding of Software Development Lifecycle (SDLC - various methodologies Waterfall, Agile ).
- Good experience in developing projects using TALEND studio for Big data.
- Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
- Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
- Developed PIG Latin scripts for handling business transformations.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs .
- Excellent working knowledge of SSAS, SSRS and Crystal Reports
- Expert in Creating, configuring and fine tuning ETL workflows designed in SSIS and Talend
- Excellent experience in using scheduling tools to automate batch jobs
- Experience in using data modeling tools like Erwin and Sybase Power Designer
- Extensive experience in using SQL.
- Excellent Database experience in PostgreSQL, SQL Server 2012/2017, Oracle 8x/9i/10g, T-SQL and PL/SQL (Stored Procedures, Functions, Triggers).
- Good exposure with database performance evaluation and tuning.
- Experienced in developing applications on various tools like Microsoft BIDS and Visual Studio
- Good exposure with Windows, Linux, and AWS cloud related environments
- Excellent communication and interpersonal skills.
- Able to manage changing responsibilities and deliver time critical projects on schedule
- Hands on experience in developing ETL related tasks in BigData environment leveraging Pig, Python, and Hive
- Involved in development of enterprise solution in Healthcare, Banking and Education domains.
TECHNICAL SKILLS:
Programming: Java, Python, JavaScript
Web Framework: AngularJS 2, Bootstrap (HTML5, CSS)
Mainframe Languages: COBOL, JCL
Database: (RDBMS) SQL Server, DB2, PostgreSQL, MySQL
ETL: SSIS
Data Modeling: Erwin
Database (NoSQL): MongoDB, HBase, Cassandra, Neo4J
IDE: Eclipse, Code Visual Studio
Application Server: Tomcat, Django
Build Tools: Ant, Maven, Gradle, Github
Micro Service & RESTful: Spring, Swagger (API design)
Agile Methodology: Rational Unified Process (RUP), JIRA
Testing framework: JUnit
Operating Systems: Windows, Linux
Cloud: AWS
Big Data: Eco System Apache Hadoop, Cloudera Hadoop HDFS, YARN, MapReduce, Pig, Hive, Kafka, Sqoop, Flume, Spark
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Hadoop/ Big data Engineer
Responsibilities:
- Crawling and extracting relevant provider information from the web portals to MySQL staging database.
- Designing and developing Pig, Python, Hive and Java MapReduce programs to process massive amount of provider information.
- Collaborating with SOLR team to index design
- Developing PoC repositories in HBase, MongoDB, and Cassandra
- Developing automation jobs in Linux bash shell.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Developing purge and archive routines for HDFS file cleanup.
- Create Stored procedures, views and triggers.
- Design, develop, implement, and assist in validating ETL processes
- Design and develop SSIS ETL workflow.
- Built an EC2 instance(RHEL).
- Migrating data to cloud.
Environment: Java, Python, Hadoop Ecosystem (Pig, Hive, HDFS, MapReduce, YARN, Cloudera Manager, Sqoop, Flume, Zoo-Keeper, Kafka, Impala), Amazon Web Services(AWS),NoSQL Databases (MongoDB, Cassandra, HBase), MySQL, Cloudera, JIRA, Linux Shell Scripting, SSIS, Talend Studio Data Integration.
Confidential
Student Developer
Responsibilities:
- Providing support for the newly developed website by making necessary changes as per design using HTML, CSS, JavaScript.
- Developing a database repository.
- Developing Python programs for data extraction and cleansing related jobs.
- Help Prospective International Students with transition to UHCL and answering their queries (through e-mail and phone calls) before their arrival to United States.
- Organized New International Student Orientation for Spring-16, Summer-16 and Fall-16.
- Processing applications to International Student Advisors.
Environment: Bootstrap (HTML, CSS), JavaScript, Python, PostgreSQL, AngularJS.
Confidential
Application Developer
Responsibilities:
- Requirement analysis
- Creating Conceptual Schema and ERD diagrams
- Creating and maintaining the Physical Schema on MSSQL
- Materialize design to code for assigned tasks/projects.
- Ensure coding standards and guidelines are adhered.
- Collaborating with Business Analysts and Quality Engineers for defect mitigation and bug fixing
- Production maintenance and support.
- Exploration Warehouse development for retail banking.
- Archival status reports using SSRS
- SSIS packages to load from mainframe datasets, external flat files
- Code optimization and query tuning
- Provide Automated Customer Account Transfer Service.
- Task automation design, development and deployment
- Developing data cleansing routines and parsers in python
- API documentation
Environment: SSIS, SSRS, MSSQL, Erwin, Python, COBOL, JCL, VSAM, DB2, CICS (IBM Mainframe), Django.
Confidential
Database Developer
Responsibilities:
- Requirement analysis for Conceptual, Logical, and Physical Data Models
- Design and development of ETL and Metadata/MDM components using SSIS
- Design and develop data stream ETL for external data sources
- Perform data cleansing and quality using Fuzzy lookups
- Design and develop control flows
- Automate the data consolidation process
- Prepare Master Data Management repositories like - Customers, Products and Services
- Work with QA team on fixing data quality or process bugs
- Perform ETL or Query tuning for better performance
Environment: SSIS, Python, MongoDB, T-SQL, SQL 2012, ERWIN
