Big Data Developer Consultant Resume
Charlotte North, CarolinA
PROFESSIONAL SUMMARY:
- Over 8 years of professional IT experience which includes experience in Big data ecosystem and Data Science
- Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Namenode Data Node and MapReduce programming paradigm.
- Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
- Identified, measured and recommended improvement strategies for KPIs across all business areas
- Experience using Talend Integration Suite (6.1/5.x) / Talend Open Studio (6.1/5.x)
- Experience with Talend Admin Console (TAC).
- Have sound exposure to Retail market including Retail Delivery System, Financial and medical Insurance environments.
- Managed Amazon Redshift Cluster such as launching the cluster by specifying the nodes and performing data analysis queries.
- Managed Kafka Hortonworks Cluster for streaming IOT data
- Experience in using cloud components and connectors to make API calls for accessing data from cloud storage (Google Drive, Salesforce, Amazon S3, Drobox) in Talend Open Studio
- Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
- Worked on ORC, Avro, Parquet formats and their codec compressions
- Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In - depth understanding of Data Structure and Algorithms.
- Involved Performance Tuning Techniques for Hive and Impala Table’s.
- Experience in writing python scripts for data ingestion to Hadoop from various sources.
- Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
- Experience in Airflow platform monitor and retrofit data pipelines.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in managing Hadoop clusters using Cloudera Manager tool.
- Experience In working with Mapr MCS Tool in Hadoop Distribution
- Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
- Extensive experience working in Oracle DB2 SQL Server and My SQL database.
- Hands on experience in VPN Putty, WinSCP, Cyber Duck, FileZilla etc.
- Scripting to deploy monitors checks and critical system admin functions automation.
- Hands on experience in application development using Java RDBMS and Linux shell scripting.
SKIILS:
Hadoop Ecosystem Development: Sqoop, MapReduce, Hive, Pig, Flume, Oozie, Zookeeper, HBase, Spark, Storm, Kafka, Drill, HDFS
Data science Tools: Rstudio, Zeepline, Jupyter
ETL Tools: Talend, Datameer
Operating System: Linux, Windows XP, Server 2003, Server 2008, Redhat, Ubuntu
Databases: MySQL, Oracle, MS SQL Server, DB2, MS Access, HBase, MongoDB
Languages: Python, Sql, Pig, UNIX, Shell Scripting
Cloud: AWS, RedShift, EMR, EC2, S3, VPC, VPN, Load Balancer, Azure and HDInsights
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, North Carolina
Big Data Developer Consultant
Responsibilities:
- Launched 5 Node Mapr Cluster 5.2 with 80 Cores,600G Ram on AWS, to implement a data recommendation engine.
- Installed Spark, Hive, Oozie Rstudio, Zeepline on to Mapr Hadoop Cluster.
- Mentored sophisticated organizations on large scale data and analytics using advanced statistical and machine learning models.
- Architected and implemented analytics and visualization components for device data analysis platform to predict hardware
- Wrote Python and Scala api from Client’s Rdbms to aws s3 and implemented on Spark Execution framework.
- Worked Intensively on Spark Shell, Pyspark.
- Experience in performance tuning and query optimization in AWS Redshift
- Performed data manipulations using various Talend components like tMap, tJavarow, tjava, tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more
- Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance
- Created Spark Data Frames while loading the data in to Hive Tables
- Used Avro and Parquet Formats with Codec Compressions in Hive and Impala
- Used Kafka Cluster to load IOT data on the Hadoop Cluster and designed a data lake for IOT on Kafka Cluster on Azure
- Created a VPG connection From Client Network to Hadoop cluster on top of aws.
- Used Sqoop & built the data lake From MSSQL TO Hadoop cluster & scheduled jobs using crontab & Oozie
- Built a data pipeline from FTP Server to Hadoop using python api.
- Connected Hive tables to Rstudio and zeepline through Sparkly & JDBC Connection & provided data Infrastructure access to data science team to run their data models on aws cloud on their notebooks
- Connected Hive tables on Hadoop cluster to visualization tools like PowerBI & Tableau using ODBC and mapr drill connector.
- Used s3 as data backup from the hadoop cluster.
- Implemented POC on Launching Hadoop and Spark Cluster on Hdinsights.
- Developed Spark for the recommendation engine and validated using python scripts.
Environment: AWS, Spark, Python, Azure, Jupyter, Zeepline, Talend Open Studio (TOS), Talend 6.1/5.6, RedShift, Hadoop, Sqoop, Scala Hive, Flume, HBase, Kafka, PIG, Java Shell Scripting, Unix, MySQL, MS Sql, Ubuntu, Zookeeper
Confidential, Jacksonville, Florida
Hadoop Developer
Responsibilities:
- Created Hive, HBase tables using ORC file format and Snappy compression.
- Integrated Apache Kafka for data ingestion
- Set up and installed a 20 node HDP Cluster, the cluster was used to run their matching algorithm. Migrated 100 Tb Data. Migrated all spark and hive jobs
- Wrote different Pig scripts to clean up the ingested data and created partitions for the daily data.
- Conducted and moderated a Meet up for 60 Big Data enthusiast from the medical equipment there by educating them on HDP and Hadoop
- Involved in R & D Project to develop Business Intelligence/Reporting tool for Florida Blue
- Implemented Partitioning and bucketing for Hive tables based on the requirement.
- Performed Various tasks on data Ingestion from DB2 to the HDFS using Sqoop JDBC Connector
- Created shell scripts to parameterize the Pig, Hive actions in Oozie workflow.
- Designed and implemented Incremental Imports into Hive tables.
- Created different formats of Hive Tables Finally worked on ORC & Avro and picked ORC as fit requirements.
- Managed and reviewed the Hadoop log files.
- Provisioning and managing multi-tenant Cassandra cluster on public cloud environment.
- Created Hive External & Managed Tables & sorted out the best performance and voted to External Tables.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig scripts.
- Worked on aggregating the data using advanced aspects of Map-Reduce.
Environment: Kafka, Hortonworks, Hadoop, MapReduce, HDFS, Hive, Hadoop distribution of Horton Works Cloudera, Pig, HBase, Linux, XML, MySQL, Java 6 Eclipse, Oracle 10g, PL/SQL SQL PLUS, Python
Confidential, Mclean, Virginia
Hadoop Developer
Responsibilities:
- Imported data from Teradata systems to AWS S3 using Data Transfer and MapReduce.
- Experience on Hadoop data ingestion using ETL tools Data stage and Hadoop transformation.
- Worked on Inbuilt Quantum Application where we used to run our workflow with spark application.
- Experienced with Linux operating system and shell scripting.
- Worked on Apache Spark for real time and batch processing.
- Developed Map Reduce code using Java and Spark-SQL/Streaming for faster testing and processing of data.
- Used Kibana Elastic Search for handling log messages that are handled by multiple systems.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Implemented Mongo Db and set up Mongo Components to Write Data to Mongo and S3 simultaneously.
- Worked on GITHUB for Version Control Tool.
- Developed scalable modular software packages for various APIs and applications.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Python, Sqoop, HBase, Pig, Oozie, Storm, Kerberos, Java, Linux, Shell Scripting
Confidential
Hadoop Developer
Responsibilities:
- Worked on a live 16 nodes Hadoop cluster running CDH4.4
- Performed Flume & Sqoop imports of data from Data warehouse platform to HDFS.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Implemented Data classification algorithms using Map reduce design patterns.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution.
- Used Pig to do data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
- Supported MapReduce Programs those are running on the cluster.
- Extracted the data from Teradata into HDFS using the Sqoop.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Linux, XML, MySQL, Java 6 Eclipse
Confidential
Software Engineer
Responsibilities:
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Collected and aggregated of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
- Developed an automated process using Shell script which drives the data Ingestion.
Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG