Bigdata Developer Resume
SUMMARY
- Overall 10+ years of experience as Big Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
- Experience in Worked on NoSQL databases - HBase, Cassandra&MongoDB, database performance tuning & data modeling.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries.
- Good knowledge on Python Collections, Python Scripting and Multi-Threading.
- Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER/Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, and AWSRedshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans
- Expert in Amazon EMR, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
- Good Working on Apache Nifi as ETLtool for batch processing and real time processing.
TECHNICAL SKILLS
Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Red shift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Programming Languages: SQL, PL/SQL, UNIX shell Scripting
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Operating System: Windows 7/8/10, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
PROFESSIONAL EXPERIENCE
Confidential
Bigdata Developer
Responsibilities:
- Worked on loading structured and semi-structured data into HDFS using Sqoop.
- Involved in copying large data from Amazon S3 buckets to HDFS using.
- Used big data analytical and processing tools Hive, Spark Core, Spark SQL for batch processing large data sets on Hadoop cluster.
- Implemented Spark SQL queries, Hive queries and performed transformations on data frames.
- Performed data Aggregation operations using Spark SQL queries.
- Implemented Hive Partitioning and bucketing for data analytics.
- Used Maven Build tool for code repository.
- Used GitHub as code repository and version control system.
- Experienced with different scripting language like Python and shell scripts.
- Developed variousPythonscriptstofindvulnerabilitieswithSQLQueriesbydoingSQL injection, permission checks and performance analysis.
- Designed and implemented Restful Webservices using Spring and SolrJ for serving different platform clients like web, iOS and android.
- InvolvedinconvertingHive/SQLqueriesintoSparktransformationsusingSparkRDD,Scala and Python.
- Involved in working with Sqoop to export the data from Hive to S3 buckets.
- Involved in writing bash scripts to automate the solr deployment, Logstash-forwarder, Logstash, background batch process for generating solr nodes health and stats report and on demand process for indexing to solr using SolrJ API.
- Implemented some of the big data operations on AWS cloud. Created cluster using EMR, EC2 instances, S3 buckets, analytical operations on RedShift, performed RDS, Lambda operations and managed resourcesusing IAM.
Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Apache Kafka, Apache Sqoop, Agile Methodology, Amazon S3
Confidential
Sr. Bigdata/Spark Developer
Responsibilities:
- Worked on Cloudera distribution.
- Involved in extracting customer's data from various data sources to HDFS data lake which include data from relational RDBMS and csv files.
- Loaded and transformed large sets of structured and semi-structured data using Spark.
- Involved in working with Sqoop for loading the data from RDBMS to HDFS.
- Extensively used Spark Core, Spark SQL.
- Used the DataStage Designer to develop processes for extracting, cleansing, transforming, integrating and loading data into staging tables
- Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
- Monitoring the Datastage job on daily basis by running the UNIX shell script and made a force start whenever job fails
- Created Datastage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc.
- Developed leading routines and data extracts with Informatica Unix SAS and Oracle procedures
- Developed Spark applications Using Scala as per the Business requirements.
- Used Spark Data Frame Operations to perform required validations on the data.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
- Created Hive tables and working on them for data analysis to cope up with the requirements.
- Implemented Hive Partitioning and bucketing for data analytics.
- Analyzed the data by performing HQL, Spark SQL.
- Conducted tuning for SQL and PL procedures Informatica objects and Views
- Loaded the Cleaned Data into the hive tables and performed analytical functions based on requirements.
- Involved in writing bash scripts to automate the solr deployment, Logstash-forwarder, Logstash, background batch process for generating solr nodes health and stats report and on demand process for indexing to solr using SolrJ API.
- Involved in creating views for the data security.
- Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to Perform the necessary transformations based on the STMs developed
- Used PySpark-SQL to load JSON data and create schema RDD, Data Frames and loaded it into Hive
- Tables and handled Structured data using Spark-SQL
- Implemented Spark sample programs in python using pySpark.
- Involved in the performance tuning of spark applications.
- Worked on Performance and Tuning operations in Hive.
- Created custom workflows to automate Sqoop jobs monthly.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
- Experienced in using version control tools like GitHub to share the code snippet among the team members.
Environment: HDFS, Hive, Apache Sqoop, Spark, Scala, YARN, Agile Methodology, Cloudera, MySQL.
Confidential - Shelton, CT
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, I will provide technical expertise and aptitude to Hadooptechnologies as they relate to the development of analytics.
- Developed Hive queries to process the data and generate the data for visualizing.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
- Conducted JADsessions with management, vendors, users and other stakeholders for open and pending issues to develop specifications.
- Managed and reviewed Hadoop log files as a part of administration for troubleshooting purposes.
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
- Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
- Designed and developed architecture for data services ecosystemspanning Relational, NoSQL, and Big Data technologies.
- Experienced in developing Web Services with Python programming language.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modelingmethodology using Erwin.
- Used ApacheHive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files in AWS s3.
- Build and maintain SQL scripts, Indexes, and complex queries for dataanalysis and extraction.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Worked with Hadoop development Distributions like Cloudera (CDH4 & CDH5), Hortonworks and Amazon Web Services (AWS) in testing big data solutions & also for monitoring and managing Hadoop clusters.
- Used Python to extract weekly information from XML files.
- Developed numerous MapReduce jobs in Scala for DataCleansing and AnalyzingData in Impala.
- Collected large amounts of log data using ApacheFlume and aggregating using PIG/HIVE in HDFS for further analysis.
- Generated various reports using SQL Server Report Services (SSRS), SQL Server Integrating Services for business analysts and the management team.
- Created AWS S3 buckets also managed policies for AWS S3 buckets and Utilized AWS S3 bucket to store.
- Worked with NoSQLdatabases like HBase in creating tables to load large sets of semistructured data coming from source systems.
- Designed and Developed Oracle and UNIX Shell Scripts for DataImport/Export and Data Conversions.
- Wrote Python scripts to parse XML documents and load the data in database.
- Designed and implemented businessintelligence to support sales and operations functions to increase customer satisfaction.
- Developed and implemented different Pig UDFs to write ad-hoc and scheduled reports as required by the Business team.
Environment: Hive 2.3, Agile, Hadoop 3.0, HDFS, Erwin 9.7, NoSQL, AWS, MapReduce, Kafka, Scala, SSRS, SSIS, HBase 1.2, Python