Big Data Developer Consultant Resume Charlotte, North Carolina - Hire IT People

PROFESSIONAL SUMMARY:

Over 8 years of professional IT experience which includes experience in Big data ecosystem and Data Science
Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Namenode Data Node and MapReduce programming paradigm.
Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
Identified, measured and recommended improvement strategies for KPIs across all business areas
Experience using Talend Integration Suite (6.1/5.x) / Talend Open Studio (6.1/5.x)
Experience with Talend Admin Console (TAC).
Have sound exposure to Retail market including Retail Delivery System, Financial and medical Insurance environments.
Managed Amazon Redshift Cluster such as launching the cluster by specifying the nodes and performing data analysis queries.
Managed Kafka Hortonworks Cluster for streaming IOT data
Experience in using cloud components and connectors to make API calls for accessing data from cloud storage (Google Drive, Salesforce, Amazon S3, Drobox) in Talend Open Studio
Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
Worked on ORC, Avro, Parquet formats and their codec compressions
Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
In - depth understanding of Data Structure and Algorithms.
Involved Performance Tuning Techniques for Hive and Impala Table’s.
Experience in writing python scripts for data ingestion to Hadoop from various sources.
Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
Experience in Airflow platform monitor and retrofit data pipelines.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in managing Hadoop clusters using Cloudera Manager tool.
Experience In working with Mapr MCS Tool in Hadoop Distribution
Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
Extensive experience working in Oracle DB2 SQL Server and My SQL database.
Hands on experience in VPN Putty, WinSCP, Cyber Duck, FileZilla etc.
Scripting to deploy monitors checks and critical system admin functions automation.
Hands on experience in application development using Java RDBMS and Linux shell scripting.

SKIILS:

Hadoop Ecosystem Development: Sqoop, MapReduce, Hive, Pig, Flume, Oozie, Zookeeper, HBase, Spark, Storm, Kafka, Drill, HDFS

Data science Tools: Rstudio, Zeepline, Jupyter

ETL Tools: Talend, Datameer

Operating System: Linux, Windows XP, Server 2003, Server 2008, Redhat, Ubuntu

Databases: MySQL, Oracle, MS SQL Server, DB2, MS Access, HBase, MongoDB

Languages: Python, Sql, Pig, UNIX, Shell Scripting

Cloud: AWS, RedShift, EMR, EC2, S3, VPC, VPN, Load Balancer, Azure and HDInsights

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, North Carolina

Big Data Developer Consultant

Responsibilities:

Launched 5 Node Mapr Cluster 5.2 with 80 Cores,600G Ram on AWS, to implement a data recommendation engine.
Installed Spark, Hive, Oozie Rstudio, Zeepline on to Mapr Hadoop Cluster.
Mentored sophisticated organizations on large scale data and analytics using advanced statistical and machine learning models.
Architected and implemented analytics and visualization components for device data analysis platform to predict hardware
Wrote Python and Scala api from Client’s Rdbms to aws s3 and implemented on Spark Execution framework.
Worked Intensively on Spark Shell, Pyspark.
Experience in performance tuning and query optimization in AWS Redshift
Performed data manipulations using various Talend components like tMap, tJavarow, tjava, tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more
Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance
Created Spark Data Frames while loading the data in to Hive Tables
Used Avro and Parquet Formats with Codec Compressions in Hive and Impala
Used Kafka Cluster to load IOT data on the Hadoop Cluster and designed a data lake for IOT on Kafka Cluster on Azure
Created a VPG connection From Client Network to Hadoop cluster on top of aws.
Used Sqoop & built the data lake From MSSQL TO Hadoop cluster & scheduled jobs using crontab & Oozie
Built a data pipeline from FTP Server to Hadoop using python api.
Connected Hive tables to Rstudio and zeepline through Sparkly & JDBC Connection & provided data Infrastructure access to data science team to run their data models on aws cloud on their notebooks
Connected Hive tables on Hadoop cluster to visualization tools like PowerBI & Tableau using ODBC and mapr drill connector.
Used s3 as data backup from the hadoop cluster.
Implemented POC on Launching Hadoop and Spark Cluster on Hdinsights.
Developed Spark for the recommendation engine and validated using python scripts.

Environment: AWS, Spark, Python, Azure, Jupyter, Zeepline, Talend Open Studio (TOS), Talend 6.1/5.6, RedShift, Hadoop, Sqoop, Scala Hive, Flume, HBase, Kafka, PIG, Java Shell Scripting, Unix, MySQL, MS Sql, Ubuntu, Zookeeper

Confidential, Jacksonville, Florida

Hadoop Developer

Responsibilities:

Created Hive, HBase tables using ORC file format and Snappy compression.
Integrated Apache Kafka for data ingestion
Set up and installed a 20 node HDP Cluster, the cluster was used to run their matching algorithm. Migrated 100 Tb Data. Migrated all spark and hive jobs
Wrote different Pig scripts to clean up the ingested data and created partitions for the daily data.
Conducted and moderated a Meet up for 60 Big Data enthusiast from the medical equipment there by educating them on HDP and Hadoop
Involved in R & D Project to develop Business Intelligence/Reporting tool for Florida Blue
Implemented Partitioning and bucketing for Hive tables based on the requirement.
Performed Various tasks on data Ingestion from DB2 to the HDFS using Sqoop JDBC Connector
Created shell scripts to parameterize the Pig, Hive actions in Oozie workflow.
Designed and implemented Incremental Imports into Hive tables.
Created different formats of Hive Tables Finally worked on ORC & Avro and picked ORC as fit requirements.
Managed and reviewed the Hadoop log files.
Provisioning and managing multi-tenant Cassandra cluster on public cloud environment.
Created Hive External & Managed Tables & sorted out the best performance and voted to External Tables.
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig scripts.
Worked on aggregating the data using advanced aspects of Map-Reduce.

Environment: Kafka, Hortonworks, Hadoop, MapReduce, HDFS, Hive, Hadoop distribution of Horton Works Cloudera, Pig, HBase, Linux, XML, MySQL, Java 6 Eclipse, Oracle 10g, PL/SQL SQL PLUS, Python

Confidential, Mclean, Virginia

Hadoop Developer

Responsibilities:

Imported data from Teradata systems to AWS S3 using Data Transfer and MapReduce.
Experience on Hadoop data ingestion using ETL tools Data stage and Hadoop transformation.
Worked on Inbuilt Quantum Application where we used to run our workflow with spark application.
Experienced with Linux operating system and shell scripting.
Worked on Apache Spark for real time and batch processing.
Developed Map Reduce code using Java and Spark-SQL/Streaming for faster testing and processing of data.
Used Kibana Elastic Search for handling log messages that are handled by multiple systems.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Implemented Mongo Db and set up Mongo Components to Write Data to Mongo and S3 simultaneously.
Worked on GITHUB for Version Control Tool.
Developed scalable modular software packages for various APIs and applications.
Developed a data pipeline using Kafka and Strom to store data into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.

Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Python, Sqoop, HBase, Pig, Oozie, Storm, Kerberos, Java, Linux, Shell Scripting

Confidential

Hadoop Developer

Responsibilities:

Worked on a live 16 nodes Hadoop cluster running CDH4.4
Performed Flume & Sqoop imports of data from Data warehouse platform to HDFS.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Implemented Data classification algorithms using Map reduce design patterns.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution.
Used Pig to do data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
Supported MapReduce Programs those are running on the cluster.
Extracted the data from Teradata into HDFS using the Sqoop.

Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Linux, XML, MySQL, Java 6 Eclipse

Confidential

Software Engineer

Responsibilities:

Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
Collected and aggregated of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
Developed an automated process using Shell script which drives the data Ingestion.

Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG

We provide IT Staff Augmentation Services!

Big Data Developer Consultant Resume

Charlotte North, CarolinA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship